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Preface 


Digital Twin: Architectures, Networks, and Applications offers comprehensive, self- 
contained knowledge on Digital Twin (DT), which is a highly promising technology 
for achieving digital intelligence and digitally transformed society. DT is a key tech- 
nology to connect physical systems and digital spaces. A digital twin is defined as 
the real-time digital replica of a real-world physical object. Digital twin in the digital 
space is able to monitor, design, analyze, optimize and predict physical systems. The 
bi-directional interaction between physical spaces and digital spaces brings many 
advantages, including low maintenance cost, reduced security risk, and substantially 
increased Quality-of-Service. Digital twin can also create unprecedented applica- 
tions and services, ranging from Extended Reality (XR), immersive multimedia to 
remote medical care, autonomous driving, Web 3.0 and Metaverse. 

The objectives of this book are to provide the basic concepts of DT, to explore 
the promising applications of DT integrated with emerging technologies, and to give 
insights into the possible future directions of DT. For easy understanding, this book 
also presents several use cases for DT models and applications in different scenarios. 
This book has the following salient features: 


* Provides a comprehensive reference on state-of-the-art technologies for digital 
twin 

* Covers basic concepts, techniques, research topics and future directions 

* Contains illustrative figures that enable easy understanding of digital twin 

* Allows complete cross-referencing owing to the broad coverage on digital twin 

* Identifies the unique challenges for efficiently improving the performance of 
digital twin networks 


This book allows an easy cross-reference owing to the broad coverage on both the 
principle and applications of DT. It provides a comprehensive technical guide cover- 
ing basic concepts, innovative techniques, fundamental research challenges, recent 
advances and future directions on DT. The book starts with the basic concepts, mod- 
els, and network architectures of DT. Then, we present the new opportunities when 
DT meets edge computing, Blockchain and Artificial Intelligence, and distributed 
machine learning (e.g., federated learning, multi-agent deep reinforcement learning). 


Vili Preface 


In the last part, we present a wide application of DT as an enabling technology for 
6G networks, Aerial-Ground Networks, and Unmanned Aerial Vehicles (UAVs). We 
also identify the future direction of DT in Reconfigurable Intelligent Surface (RIS) 
and Internet of Vehicles. 

The primary audience includes senior undergraduates, postgraduates, educators, 
scientists, researchers, engineers, innovators and research strategists. This book is 
mainly designed for academics and researchers from both academia and industry 
who are working in the field of telecommunications, computer science and engi- 
neering, and digitalization. Students majoring in computer science, electronics, and 
communications will also benefit from this book. 


October, 2023 Yan Zhang 
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Chapter 1 
Introduction 


Abstract This chapter first gives an overview of digital twin, including the devel- 
opment timeline and possible application areas. The features of digital twin are 
summarized in detail. Then, the concepts, fundamentals, and visions of digital twin 
are presented. 


1.1 Overview of Digital Twin 


In recent years, digital twin has emerged as one of the most promising enabling 
technologies for sixth-generation (6G) mobile networks. Both academic and industry 
have shown increasing interest in unlocking the potential applications of digital twin 
in a range of areas, including smart cities, intelligent transportation, healthcare, 
energy, and Industrial Internet of Things (IIoT). 

The overall development timeline of digital twin is shown in Fig. 1.1. The term 
digital twin was first introduced in 2002 by Michael Grieves in a presentation about 
product life cycle management. Thereafter, Framling et al. [1] proposed an agent- 
based architecture that maintains a corresponding virtual counterpart or agent for 
each product item with a faithful view of the product status and information [2]. These 
works advanced the initial exploration of digital twins from a conceptual point of 
view. Before 2010, NASA put the digital twin concept into practical application by 
developing two identical space vehicles for the Apollo project that could simulate 
and reflect the flight status in training. Since then, the idea of digital twins has been 
explored in areas such as aircraft maintenance and air force product management 
[3]. In 2017, Grieves gave the formal definition of digital twin in a digital twin white 
paper [4]. This white paper presented the basic digital twin model, which consists of 
physical objects, virtual objects, and a data link between physical space and virtual 
space. Recently, digital twin has been widely investigated in the areas of the IIoT and 
manufacturing for applications such as predictive diagnosis, production planning, 
and performance optimization [5]. Gartner listed digital twin as one of the top 10 
strategic technologies in 2017, predicting that millions of things would have digital 
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1) A real space containing a physical object; 
2) A virtual space containing a virtual object; 
3) The link for data flow from real space to 
virtual space (and virtual sub-spaces), and for 


An agent-based architecture 
where each product item has a 
corresponding virtual counterpart 


information flow from virtual space (and sub- 
spaces) to real space 


Formally define DT in the white paper 


or agent associated with it 


Propose a concept similar to DT 


2002; i 2010; 2013; ice 
Michael Grieves A NASA U.S. Air Force I 2017 - Present 
! 2003; à 2015, 2017; 
I Framlingetal 1 1 Michael Grieves 1 


Propose the concept of 
Digital Thread and the 
Digital Twin, where digital 
refers to the communication 


Put forward the NASA started 
concept of product investigating Propose the 


applications 


lifecycle management | | and developing 
(PLM) in the speech DTs for its of DT in IoT 


framework used to digitally 


for the first time space assets link all product data 


Fig. 1.1 The development timeline of digital twin 


twin representations within three to five years. Gartner also listed digital twin as 
one of the top 10 strategic technologies in the next two years [6, 7], which shows 
industry’s great confidence in digital twin technology. Gartner’s top 10 strategic 
technologies for 2018 are shown in Fig. 1.2, with digital twin listed at the peak of 
inflated expectations. 

More recently, digital twin has attracted a great deal of attention and has been 
widely explored in aviation, healthcare, smart cities, intelligent transportation, urban 
intelligence, and future wireless communications. The digital twin fulfils the role 
of collecting the real-time and historical running status of physical objects and 
making corresponding predictions and optimized decisions to improve the running 
performance of physical systems. In the field of aviation, digital twin has been used 
for aircraft maintenance, structuring, and risk prediction. For example, the authors in 
[8] proposed leveraging digital twin to model aircraft wings in the detection of fatigue 
cracks. In healthcare, with the assistance of wearable sensors and Internet of Things 
(IoT) devices, digital twin can be used to collect detailed physiological status and 
medication data about patients, which can help to monitor their medical condition 
and provide them with advanced medical care. In intelligent transportation systems, 
digital twins can help manage traffic, plan driving paths, and maintain transportation 
facilities. Real-time traffic conditions and the status of transportation facilities can 
be mirrored and analysed by digital twins in these transportation systems. Moreover, 
although the application of digital twins to scenarios of wireless communications is 
still in its infancy, several works can already be found that introduce digital twin to 
wireless networks to improve overall performance. For example, in [9], the authors 
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Fig. 1.2 The hype cycle of emerging technologies and digital twin [6] 


proposed a new architecture, that is, digital twin edge networks, by integrating digital 
twins with blockchain to provide secure and accurate edge computing in multi-access 
edge computing systems. 


1.2 Digital Twin Concepts, Features, and Visions 


Recent studies have provided a series of definitions for digital twin. In our work, we 
categorize the definitions into three types: virtual mirror model-based definitions, 
computerized model-based definitions, and integrated system—based definitions. In 
the virtual mirror model-based definitions, a digital twin is defined as the virtual 
representation of a physical product, process, or system [10]. However, in this defi- 
nition, the interaction between physical objects and digital space is neglected. The 
status change of physical objects can hardly be reflected by the digital model after its 
creation. In the computerized model-based definitions [11], digital twins are treated 
as computerized models, which can be simulation processes or a series of software. 
The performance of physical objects can be improved through prediction, real-time 
control, and optimization. In the integrated system—based definitions, digital twins 
are regarded as an integration of physical objects, virtual twins, and related data [12]. 
The real-time interaction between physical objects and virtual twins is emphasized 
in this definition. The virtual twin collects the update information from physical 
objects and makes corresponding predictions of their future state. 
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DT: virtual model 
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Planning Design Optimization Maintenance Operations 


Fig. 1.3 Illustration of the digital twin model 


Based on the above analysis, we can give a comprehensive definition of digital 
twin. Digital twins are accurate digital replicas of real-world objects across multiple 
granularity levels, and the real-world objects can be devices, machines, robots, 
industrial processes, or complex physical systems. As shown in Fig. 1.3, the digital 
twin in virtual space is composed of a virtual model, corresponding running data, 
and analytical tools. The interaction between real-world objects and digital space is 
bidirectional: on one side, physical systems transmit real-time data to the virtual space 
for building digital twin models; on the other, digital twins analyse the collected data, 
update digital twin models, and provide physical objects with optimization policies 
to improve their operation performance. 

A complete digital twin model consists of three components: the data, the model, 
and software, as shown in Fig. 1.4. 


* Data, the foundation: Because the establishment of digital twins relies heavily 
on historical and real-time running data, data are the foundation of the entire 
digital twin model. The physical systems contain data, the running principle of 
the entities, and the controller that can adjust the running of the physical systems. 

* The model, the core component: 'The models (both theory models and data- 
driven models) are the core component of digital twins. The theory models 
are constructed based on the principles from the physical systems. The data- 
driven models are trained by collecting the large amounts of historical and real- 
time running data from the physical systems. A variety of techniques, including 
artificial intelligence (AI) and data processing, can be used to train the data-driven 
model. Learning models from data is an iterative process, where the models are 
trained and updated constantly, as a self-learning process. 

* Software, the essential carrier: Software is the carrier of digital twins and provides 
the interface between physical systems and digital space. The digital twin models, 
composed of algorithms, code, and software, are implemented through developing 
corresponding software. The functions of representation, diagnosis, prediction, 
and decision are deployed in a software-defined way that provides the physical 
controller with optimized command to improve the running performance of the 
physical systems. 
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The digital twin provides precise mapping from the physical world to digital 
space, with real-time interactions. Digital twin mitigates the huge gap between 
physical space and digital space through continuous synchronization and updates. 
The features of digital twins can be summarized as follows. 


* Precise mapping: Digital twin establishes the mapping between physical objects 
and digital space. The historical data and current running status of physical objects 
are synchronized to the digital space for further processing and analysis. Based 
on the transmitted data, digital twins can completely reflect the status of physical 
objects and establish full mapping between the physical space and digital space. 
The mapping should completely reflect the full state of physical objects, with low 
mapping error. 

* Real time: Different from conventional simulation and modelling technologies, 
digital twins keep synchronizing with physical objects in real time. The collected 
data are computed on the digital twin side to extract the corresponding status and 
build the model of physical objects. Communication is also executed continuously 
to update the digital twin models. Thus, real-time edge computing should be 
implemented to ensure the timeliness of digital twins. 

* Distributed: The physical objects of digital twins can be sensors, IoT devices, and 
physical systems. In digital twin—assisted wireless networks, multiple physical 
entities are distributed across the network. The digital twins of these entities 
are also distributed among different edge servers. In such cases, distributed AI 
techniques are required to model digital twins from distributed physical objects. 

* Intelligent: In addition to reflecting the running status of physical objects through 
real-time data, digital twins also incorporate running models of physical objects. 
Intelligent techniques, especially AI algorithms, can be used to build digital 
twin models by processing and analysing the large amounts of running data. 
With the help of the constructed models, digital twins can provide physical 
systems with optimization, decisions, and predictions. For example, in intelligent 
transportation, the digital twin of road traffic can help drivers decide on the optimal 
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path by analysing real-time traffic conditions and predicting traffic conditions in 
the near future. 

Bidirectional: The interactions between digital twins and physical objects are 
bidirectional: physical objects transmit and update their running data to digital 
twins, and digital twins provide physical objects with optimization decisions. The 
real-time feedback from digital twins to physical objects is one of the unique 
characteristics of digital twin technology. 


Digital twins can provide physical systems with the following benefits through 


optimization, prediction, and automation processes. 


Higher performance: Digital twin can improve system performance through mak- 
ing optimal decisions and executing operations to adjust and control the running 
of physical systems. In addition, the planning and design of physical systems, 
such as industrial equipment and healthcare products, can be implemented by 
using a digital twin to simulate the real-world running performance. Thus, the 
performance of physical systems can be improved by using digital twins to collect 
their real-time data and instruct their further operation. 

Closer monitoring: Digital twins should copy the complete status of physical 
objects. To achieve this, physical objects continuously update their running data 
at a digital twin server. Digital twin servers can closely monitor physical objects 
in a proactive way that could not be achieved by human operators or conventional 
monitoring instruments. Digital twins integrate historical data, real-time data, and 
predicted data to track past states, monitor the current status of physical objects, 
and predict future conditions for making optimization decisions. 

Lower maintenance costs: By collecting the real-time states of physical objects 
and systems, digital twins can provide optimal maintenance strategies for exe- 
cuting real-time operations. Conventional scheduled maintenance is usually de- 
termined in the design phase of physical objects, which leads to high costs and 
low maintenance efficiency. Digital twins can perform predictive maintenance 
based on the real-time status of physical objects, which considerably reduces 
maintenance costs. 

Increased reliability: Digital twins can be used to provide virtual tests or sim- 
ulations of the running of physical objects or systems. The timely digital twin— 
assisted assessment can improve the quality of physical objects and enhance the 
reliability of real-world systems. 

Lower physical system failure risks: Certain operations can cause physical system 
failure, resulting in high loss and damage. Digital twins can provide more realistic 
and accurate simulations for various operations. The simulated environment pro- 
vided by digital twins is identical to real-world conditions. Thus, the operations 
of real-world objects and systems can be precisely explored, simulated, and tested 
to avoid detrimental impacts on physical systems. 


With the aforementioned benefits, digital twins can be applied in a variety of 


scenarios to enhance physical system performance. 


Smart manufacturing: Traditional manufacturing faces the problems of limited 
production efficiency and long product life cycles. Although historical data and 
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simulation have been applied, non-real-time interactions are involved, as well as 
the low utility of real-time data. The connection between physical objects and 
virtual space is the key challenge for smart manufacturing in the era of Industry 
4.0. Digital twins can integrate physical systems with digital space by analysing 
huge amounts of historical and real-time data throughout a product life cycle. The 
results from digital space can provide instructions to the products and processes 
of physical space. In this way, operation instructions can be optimized, and the 
quality and efficiency of the manufacturing process can be improved. 

e Aviation: Aviation was the first area to apply digital twins to practical scenar- 
ios. Digital twin has improved data processing and problem diagnosis in aircraft 
maintenance, risk prediction, construction, and self-maintenance. The full life 
cycle status of aircraft can be monitored and evaluated by digital twins through 
real-time data analysis. Real-time operations, such as for flight routes and predic- 
tive maintenance, can be determined and optimized with the help of digital twins. 
However, some challenges remain to be addressed in this area. Because aircraft 
need precise control and instructions to ensure flight safety, digital twins should 
be modeled with high accuracy. However, unreliable communications between 
physical aircraft and digital twin servers can decrease robustness and increase the 
error rate of digital twin models, which is one of the critical challenges for digital 
twin-assisted aviation. 

* Intelligent transportation: Conventional transportation systems have faced seri- 
ous problems such as traffic jams and accidents. With the assistance of electronic 
sensors, data analysis, and intelligent control, digital twins can help to improve 
traffic management and optimize transportation planning efficiency. Traffic acci- 
dents can be effectively predicted and avoided through the real-time monitoring 
and analysis of digital twins. In addition, digital twins can make optimal mainte- 
nance decisions for transportation facilities, based on simulations and evaluations 
of their usage. However, digital twins also face several challenges in this area. 
The dynamic and time-varying traffic environment poses critical challenges for 
establishing accurate transportation digital twin models. In addition, the large 
amounts of data that contain the running states and information of smart vehicles 
must be transmitted to digital twin servers. Unreliable communication conditions 
and high transmission latency increase the difficulty of building perfect digital 
twins. 

* Healthcare: With the help of IoT devices, it is possible to establish a digital twin 
for the human body by using IoT sensors and intelligent monitoring equipment 
to detect a patient's health condition. The physiological status, medication in- 
put information, emotional state, and lifestyle of a patient can be collected and 
analysed in real time by a digital twin. The full range of medical care can be 
provided by closely monitoring the patient's status and predicting their future 
health condition. In addition, digital twins can be used in short-term scenarios, 
such as in remote surgery. For example, experts can obtain real-time feedback 
by performing operations on the patient's digital twin and identifying potential 
emergencies that could occur during real-world surgery. Moreover, digital twins 
can play an important role in the monitoring, management, and maintenance of 
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medical devices. However, given the sensitive nature of patient information, the 
privacy and security of this data must be treated seriously. Emerging technolo- 
gies such as privacy computing and blockchain have the potential to improve the 
protection level of sensitive data. 

* 6G Networks: 6G is predicted to realize the visions of global coverage, enhanced 
efficiency, fully connected intelligence, and enhanced security. With such visions, 
tremendous amounts of data must be processed at the edge of the network to 
provide ultra-low-latency services. In addition, security and privacy issues in data 
processing need to be addressed. The emergence of digital twins opens up new 
possibilities to address these challenges for 6G networks. Digital twin technology 
could be used in various ways to improve the performance of 6G networks. For 
example, in 6G terahertz communication, digital twin can be used to model, 
predict, and control signal propagation to maximize the signal-to-noise ratio. In 
addition, the real-time mirror of physical systems can help to mitigate unreliable 
and long-distance communications between end users and servers in 6G networks. 
Digital twins can bridge the huge gap between physical systems and digital space 
in 6G networks, which can enhance the robustness of wireless connectivity and 
the intelligence of connected devices. Moreover, network facilities, such as mobile 
cell towers, can be monitored, planned, and maintained by using digital twins to 
simulate and evaluate their real-time status. 


1.3 Book Organization 


This book aims to provide a comprehensive view of digital twin, including fun- 
damentals, visions, and applications. As an emerging technology, digital twin will 
play an important role in various fields, including manufacturing, transportation, and 
future networks. However, the high requirements of digital twin, such as ultra-low 
latency, huge amounts of data transmission, and distributed processing, pose new 
challenges to the implementation and application of this technology. 

To help us confront these challenges, this work provides a comprehensive theory 
of digital twin and discusses enabling technologies and applications of this paradigm. 
We first present the fundamental principles of digital twin, including concepts, ar- 
chitectures, features, and visions. Next, we provide digital twin modeling methods 
and digital twin networks. We also discuss a number of enabling technologies for 
digital twin, including AI, edge computing, and blockchain. Moreover, we discuss 
research opportunities for digital twin in the emerging scenarios such as 6G, un- 
manned aerial vehicles, and Internet of Vehicles. This book is organized as follows. 
Section 2 presents digital twin models and digital twin networks. Section 3 discusses 
the use of AI in digital twin, including deep reinforcement learning and federated 
learning. Section 4 describes the integration of digital twin with edge computing and 
presents in detail the new architecture of digital twin edge networks. The application 
of edge intelligence for digital twin is also introduced. Incorporating blockchain into 
digital twin is discussed in Section 5. Section 6 details the application of digital twin 
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in 6G networks. Finally, Sections 7 and 8 describe the application of digital twin to 
unmanned aerial vehicles and Internet of Vehicles. 
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Chapter 2 
Digital Twin Models and Networks 


Abstract A digital twin (DT) model reflects one or a group of physical objects that 
exist in a complex real system to a virtual space. By interconnecting and coordi- 
nating multiple independent models, a DT network (DTN) can be built to map the 
associations and interactions between physical objects. In this chapter, we present 
DT models in terms of modelling frameworks, modelling methods, and modelling 
challenges. Then we elaborate the concept of DTNs and compare it with the concept 
of DT. The communication mechanisms, application scenarios, and open research 
issues of DTNs are then discussed. 


2.1 Digital Twin Models 


The main goal of DT technology is to reflect the physical world into a virtual space 
composed of DT models corresponding to different physical objects. As the basic 
element for realizing the DT function, a DT model describes the characteristics of 
objects in multiple temporal and spatial dimensions. More specifically, the model 
always contains the physical object's geometric structure, real-time status, and back- 
ground information and can further include a fully digital representation of the 
object's interaction interfaces, software configuration, behavioural trends, and so 
forth. In this section, we review the DT modelling framework and introduce three 
categories of DT modelling approaches. Moreover, we discuss the challenges and 
unexplored problems of DT modelling. 

The framework acts as a roadmap of the DT modelling process, which guides 
the twin system planning, digital model design, mapping step implementation, and 
performance evaluation. In particular, the DT modelling framework breaks down the 
complex modelling process into explicit parts and helps to elucidate the factors or 
interactions that affect the mapping accuracy. Several previous works have focused 
on the DT modelling framework. 

A general and standard framework for DT modelling was first built by Grieves 
[4]. In the framework, the DT model was described in three dimensions, that is, the 
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physical entity, the virtual model, and the connection between the physical and virtual 
parts. This framework has been widely applied to guide DT model construction for 
industrial production. 

Inspired by Grieves’ general framework, several studies have extended its hier- 
archical structure. For instance, Liu et al. [13] presented a four-layer DT modelling 
framework consisting of a data assurance layer, a modelling calculation layer, a DT 
function layer, and an immersive experience layer. Schroederet et al. [14] further 
introduced a framework composed of a device layer, a user interface layer, a web 
service layer, a query layer, and a data repository layer. Compared with Grieves' 
framework, the frameworks proposed in [13] and [14] consider the interactions be- 
tween the users and DT models, in addition to physical-virtual interactions, and 
further emphasize the function of DT. 

Different from the aforementioned studies, Tao et al. [5] described the DT model 
architecture from the perspective of components. The authors proposed a five- 
dimensional DT model that encompasses the physical part, the virtual part, the 
data, connections, and services. This multidimensional framework fuses the data 
from both the physical and virtual aspects into a DT model to comprehensively and 
accurately capture the features of the physical objects. Moreover, the framework can 
encapsulate DT functions, such as environment detection, action judgement, and 
trend prediction, into the unified management of virtual systems and the on-demand 
use of twin data. The framework in [5] mainly highlights the influence of system 
characteristics composed of physical data, virtual data, service data, and historical 
experience on both virtual twins and mapping services. Due to the completeness of 
the architecture in terms of DT system composition and element association analysis, 
it has become one of the main references in the DT modelling process. 


2.1.1 DT Modelling Methods 


Along with the advancement of wireless technologies and the ever-increasing de- 
mand for ubiquitous Internet of Things (IoT) services, a vast number of intercon- 
nected smart devices and powerful infrastructures have spread around the world, 
making physical systems much more complex and diverse while adding significant 
difficulty to modelling physical objects in virtual space. In response to this prob- 
lem, three types of DT modelling approaches catering to different physical systems 
and application requirements have been introduced: a specific modelling method 
limited to a given application field, a multidimensional modelling method with mul- 
tiple functions, and a standard modelling approach for generic DT models. Figure 
2.1 compares these modelling approaches in terms of their applicable scenarios, 
advantages, and disadvantages. 

Specific modelling refers to a method that selects only the parameters most rel- 
evant to a given application scenario as the input data for the mapping and uses 
a unique mathematical model for the object's model construction. For instance, in 
[15], specific DT modelling for a power converter was described as a real-time prob- 
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Fig. 2.1 Comparison between different DT modelling approaches 


abilistic simulation process with stochastic variables developed through polynomial 
chaos expansion. The most important consideration in this scenario was the energy 
efficiency of the converter, so only parameters relevant to this objective were used 
as input data. Consequently, this converter DT model has a significantly lower com- 
putational cost than similar models. Similarly, in [16], a DT for structural health 
monitoring based on deep learning was proposed to perform real-time monitoring 
and active maintenance for bridges. In this work, the modelling method focused on 
mechanical calculus and quality assessment. 

Benefiting from its specificity, the specific DT modelling approach can theo- 
retically be perfectly adapted to given environmental characteristics and to meet 
particular application requirements. However, due to dynamic and nonlinear rela- 
tions between physical objects, in most complex application scenarios it is very 
challenging to generate accurate system mapping in virtual space through a single 
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mathematical model. The use of multidimensional DT modelling based on associated 
mathematical models seems a promising way to address this challenge. 

The multidimensional modelling approach decomposes the entire DT model con- 
struction into several submodel building processes, where each submodel corre- 
sponds to an explicit task requirement or mapping function. Some work has adopted 
this modelling approach. In [17], the individual combat quadrotor unmanned aerial 
vehicle (UAV) model is constructed as a combination of multiple specific models, 
including a geometric model, an aerodynamics model, a double closed-loop control 
behaviour model, and a rule model. In the DT modelling process, a submodel can use 
specific software, extract parts of parameters, and reflect an aspect of the physical 
objects. For instance, the three-dimensional (3D) modelling software SolidWorks 
has been leveraged to build the geometric model of the quadrotor UAV. Position 
coordinates, inertia moment, materials, and other parameters of the UAV are set ac- 
cording to the actual physical conditions. The aerodynamics model is used to realize 
the flight of the UAV model in the virtual environment. Moreover, a double closed- 
loop cascade control behaviour model is adopted to ensure the accurate mapping of 
the UAV. Through iterative optimization, feedback, updates, and adjustment of the 
UAV’s position and altitude parameters, a highly efficient and accurate DT model is 
ultimately achieved. 

In modern industrial manufacturing, 3D DT models of products can be used as 
experimental objects in production process optimization. Taking into account the 
diverse attributes of the products, the authors in [18] constructed a 3D printed DT 
model, using a mechanistic model, a sensing and control model, and a statistical 
model together with big data and machine learning technology. In the proposed 
modelling scheme, each model has a specific use. The mechanistic model is used to 
estimate the metallurgical attributes, such as the transient temperature field, solid- 
ification morphology, grain structure, and phases present. The sensing and control 
model is then used to connect multiple sensors, such as an infrared camera for tem- 
perature measurement, an acoustic emission system for capturing surface roughness, 
and an in-situ synchrotron for monitoring selected geometric features. Besides the 
models, machine learning technology is leveraged to compare the expected results 
of the mechanical models with the results obtained from big data sets to determine 
strategies for tuning the modelling approach. 

Although multidimensional modelling can match various application require- 
ments arising in complex environments, the coordination between heterogeneous 
submodels is not always efficient. Especially for some scenarios with dynamic and 
variable requirements, this multidimensional but fixed modelling approach can have 
poor scalability and is not suitable for flexible DT deployment. To address this 
problem, we can resort to a general modelling mechanism. The general model is 
always oriented to the multiple requirements of a certain application field. Based 
on the premise of comprehensively extracting the characteristic parameters of the 
physical objects, a general but complex DT mapping system is constructed by using 
standard software tools. For instance, in the field of industrial manufacturing, there 
are several instances of software development in general modelling for production 
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design and operation analysis, such as Modelica [19], AutoMod [20], FlexSim [21], 
and DELMIA [22]. 

Modelica is an open, object-oriented, equation-based general modelling language 
that can cross different fields and easily model complex physical systems, including 
mechanical, electronic, electric, hydraulic, thermal, control, and process-oriented 
subsystems models. Unlike Modelica, AutoMod is a computer modelling software 
package based on the AutoMod simulation language. It is mainly suitable for estab- 
lishing DT models of material handling, logistics, and distribution systems. AutoMod 
contains a series of logistics system modules, such as conveyor modules, automated 
access systems, and path-based mobile equipment modules. It covers 3D virtual 
reality animation, interactive modelling, statistical analysis, and other functions. 

Compared with the previous two general modelling tools, FlexSim and DELMIA 
have broader application scenarios. FlexSim is the only simulation software that uti- 
lizes a C++ integrated development environment in a graphical model environment. 
Itis designed for engineers, managers, and decision makers to test, evaluate, and visu- 
alize proposed solutions on operations, processes, and dynamic systems. It has com- 
plete C++ object-oriented function, super 3D virtual reality, and an easy-to-follow 
user interface. Moreover, due to its excellent flexibility, FlexSim is customized for al- 
most all industry modelling scenarios. Another modelling tool, DELMIA, focuses on 
a combination of front-end system design data and the resources of a manufacturing 
site and thus reflects and analyses entire manufacturing and maintenance processes 
through a 3D graphics simulation engine. The acquired digital data encompasses 
the visibility, accessibility, maintainability, manufacturability, and optimum perfor- 
mance of the production process. This tool provides a group of production-related 
libraries and smart visualizers in digital space for factory management. 

Scientific studies have also addressed general modelling methods. In [23], Schluse 
et al. proposed a DT modelling technology called Virtual Testbeds that provides 
comprehensive and interactive digital reflections of operation systems in various 
application scenarios. Moreover, these testbeds consistently introduce new structures 
and processes for simulations throughout their life cycle. In [24], Bao et al. designed 
a model-based definition technology to provide digital information carriers and twin 
images for industrial products during their design, manufacturing, maintenance, 
repair, and operation phases. As a typical general DT modelling technology, model- 
based definition technology fuses multidimensional model parameters into a single 
data source and enables industrial production and services to operate concurrently 
in virtual space. 


2.1.2 DT Modelling Challenges 


Although several DT modelling methods for industrial production, modern logistics, 
and wireless communications have been introduced in both academia and industry, 
there are still challenges to be addressed to achieve generalization, flexibility, and 
robustness of the modelling process. 
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First, there is a lack of standardized frameworks that guide DT modelling in its var- 
ious forms. A complete DT system is usually composed of a variety of heterogeneous 
subsystems. These subsystems differ significantly in their functions, structures, and 
elements. Therefore, different DT models, including geometric models, simulation 
models, business models, and so forth, need to be used to describe the respective 
subsystems. Although various modelling frameworks have been developed, none 
can simultaneously satisfy different virtual modelling requirements while accurately 
mapping the entire physical system. A standardized modelling framework is expected 
to be able to cope with various application requirements in different scenarios and 
stages and realize interoperability among the multiple heterogeneous submodels it 
contains. However, the design and implementation of this framework remain an 
unexplored problem. 

The second challenge is how to achieve high accuracy in DT modelling. Tradi- 
tional DT modelling approaches are based on general programming languages, sim- 
ulation languages, and software to construct the corresponding models. The model 
can serve only as a reference for the operation process of the physical system and can- 
not provide the core data required for virtual model construction with high-precision 
object descriptions and state prediction. In addition, traditional DT modelling can 
suffer from poor flexibility, complex configurations, and error proneness. 

Finally, how the DT models respond and react in real time to events occurring 
in the physical space is a critical challenge. In the real world, the characteristics of 
physical objects, such as their geometric shape, energy consumption, topological 
relations, and so on, change dynamically. To cope with these changes, the DT 
modelling should be updated accordingly. However, limited by sensing capability 
and data transmission capacity, it can be difficult to obtain comprehensive and real- 
time system state data in practical scenarios. How to perform high-fidelity model 
updates based on incomplete information acquisition in DT space is a problem 
worthy of future investigation. 


2.2 DT Networks (DTNs) 


A DTN is defined as a many-to-many mapping network constructed by multiple 
one-to-one DTs. In other words, a DTN uses advanced communication technologies 
to realize real-time information interactions between a physical object and its virtual 
twin, the virtual twin and other virtual twins, as well as the physical object and other 
physical objects. A DTN realizes the dynamic interactions and synchronized evolu- 
tion of multiple physical objects and virtual twins by using accurate DT modelling, 
communications, computing, and physical data processing technologies. In a DTN, 
physical objects and virtual twins can communicate, collaborate, share information, 
complete tasks with each other, and form an information-sharing network by con- 
necting multiple DT nodes. In this section, we first analyse the difference between DT 
and a DTN. Next, the communications in DTNs are discussed. Further, we depict 
some typical DTN application scenarios such as manufacturing, sixth-generation 
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(6G) networks, and intelligent transportation systems. Finally, we point out open 
research issues related to DTN. 


2.2.1 DTN Concepts 


Figure 2.2 compares the concepts of a DT and a DTN in terms of application 
scenarios, composition structure, and mapping relationships. 
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Fig. 2.2 Comparison between a DT and a DTN 


First, from the perspective of application scenarios, the concepts of DT and a 
DTN are different. DT is suitable for reflecting a single independent object, whereas 
a DTN models a group of objects with complex internal interactions. For example, 
modelling a building in virtual space through the DT approach helps optimize the 
entire life cycle of the building in terms of design, maintenance, and so on. The 
building model depends only on the analysis and decision making according to the 
building’s state data. In contrast, when building a virtual model of an industrial 
automation production line, a DTN should be used to model and reflect the col- 
laborative relationships between the multiple industrial components involved in the 
production process. 

Second, from the perspective of the operation mode, DT focuses on modelling 
an individual physical object in virtual space, and a DT model always gathers and 
processes the object’s state information in an independent mode without interacting 
with other models. Constrained by an individual DT model’s information collection 
and processing capabilities, the constructed object model might not be accurate 
enough, while both the time and energy consumption of this construction process 


18 2 Digital Twin Models and Networks 


can be high. In contrast to DT, a DTN collaborates between multiple DTs to model 
a group of objects. The information of the physical object, the processing capability 
of the DT model, and some intermediate processing results can be shared among the 
collaborative DTs. This cooperation approach significantly reduces processing time 
delays and energy consumption and greatly improves modelling efficiency. 

Finally, from the perspective of physical and virtual mapping relationships, DT 
provides comprehensive physical and functional descriptions of components, prod- 
ucts, or systems. The main goal of DT is to create high-fidelity virtual models to 
reproduce the geometric shapes, physical properties, behaviours, and rules of the 
physical world. Enabled by DT, virtual models and physical objects can maintain 
similar appearances as twin brothers and the same behaviour pattern as mirror im- 
ages. In addition, the model in digital space can guide the operation of the physical 
system and adjust physical processes through feedback. With the help of two-way 
dynamic mapping, both the physical object and the virtual model evolve together. 
Considering the mirroring effect of each physical and logical entity pair, we classify 
the mapping relationship between physical and virtual space in a DT system as one 
to one. We then characterize the mapping relationship of a DTN as many to many. 

In summary, DT is an intelligent and constantly evolving system that emphasizes 
a high-fidelity virtual model of a physical object. The mapping relationship between 
physical and virtual spaces in the DT system is one to one, with high scalability. A 
DTN is extended as a group of multiple DTs. By applying communications between 
DTs, a one-to-one mapping relationship can be easily expanded to a DTN. The 
mapping relationship is also more conducive to network management. Combined 
with advanced data processing, computing, and communications technologies, DTNs 
can easily facilitate information sharing and achieve more accurate state sensing, real- 
time analysis, efficient decision making, and precise execution on physical objects. 
Compared with DT, a DTN, which uses a network form to build complex large-scale 
systems, is more reliable and efficient. 


2.2.2 DTN Communications 


The establishment of a DTN relies on the information exchange and data commu- 
nication between the physical objects in the real world and the logical entities in 
virtual space. According to different combinations of communication object pairs, 
these communications can be divided into three types: physical-to-virtual, physical- 
to-physical, and virtual-to-virtual communications. 

Physical-to-virtual communications can be considered the process of transferring 
information from a physical system to virtual entities. This type of communica- 
tion meets the requirements of the DT modelling process for the characteristic 
parameters of physical objects, and it can also feed back the modelling results 
to the physical space to guide parameter collection and transmission adjustment. 
Physical-to-virtual technology mainly uses wide area network wireless communica- 
tion paradigms, such as LoRa and fifth-generation/6G cellular communications. In 
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these paradigms, the physical objects are wireless terminals connected to a wireless 
access network through a wireless communication base station that further relays 
data to a virtual twin connected to the Internet. The communication infrastructures 
are robust to support real-time interactions between the physical and virtual. 

Physical-to-physical communications ensure information interactions and data 
sharing between physical objects. Various wireless or wired devices, such as sensors, 
radio frequency identification, actuators, controllers, and other tags, can connect with 
IoT gateways, WiFi access points, and base stations supporting physical-to-physical 
communications. In addition, the network connections are enabled by diverse com- 
munication protocols, such as wireless personal area networks and Zigbee, and 
low-power wide area network technologies, including LoRa and Narrowband IoT. 

Virtual-to-virtual communications, which logically encompass the virtual space, 
mirror the communication behaviour in the real physical world. For instance, in 
the Internet of Vehicles use case, virtual-to-virtual communications refer to data 
transmission between the DT model entities of the vehicles. Unlike communications 
between physical vehicles that consume vehicular wireless spectrum resources and 
radio power, this virtual mode depends mainly on DT servers' computing capabil- 
ity to model data transmission behaviours. Another key benefit of virtual-to-virtual 
communications is the data transmission modelling, which breaks through the time 
constraints of the physical world. We note that communications between actual 
vehicles consume a certain amount of time. However, in virtual space, the same 
communication behaviour can be completed much more quickly. Thus, we can re- 
flect or simulate a long period of communication behaviour with a low time cost. 
Furthermore, a given communication behaviour can logically occur in virtual space 
earlier than it actually occurs in physical space. The effect of logical communications 
can be leveraged to guide resource scheduling in the real world. Edge intelligence, 
which consists of artificial intelligence-empowered edge computing servers, is a 
critical enabling technology for achieving virtual-to-virtual communications. Edge 
servers thus provide the necessary computing capability for channels’ model con- 
struction and data transmission while artificial intelligence learns the characteristics 
of the physical network and adjusts the communication modelling strategies. 


2.2.3 DTN Applications 


With the development of DT technology, many application scenarios using DTN 
to assist process management and policy adjustment have emerged, such as smart 
manufacturing, 6G networks, and intelligent transportation systems. 

Subject to the high costs of updating production, traditional manufacturing has 
problems with low production efficiency and outdated product designs. The intro- 
duction of DTNs in new smart manufacturing can effectively address these problems. 
For the factory production line, by establishing a virtual model of the entire line, the 
production process can be simulated in advance and problems in the process found, 
thereby achieving more efficient production line management and process optimiza- 
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tion. Moreover, in the real production process, the virtual twin of the factory can be 
continuously updated and optimized, including the DT model of factory construc- 
tion, product production, industrial equipment life prediction, system maintenance, 
and so forth. DTNs that match production requirements are helpful for achieving 
efficient digital management and low-cost manufacturing. 

6G networks aim to integrate a variety of wireless access mechanisms to achieve 
ultra-large capacity and ultra-small distance communications. In reaching these 
goals, 6G networks could face challenges in terms of security, flexibility, and spec- 
trum and energy efficiency. The emergence of DTNs provides opportunities to over- 
come these challenges. DTNs enable 6G networks to realize innovative services, such 
as augmented reality, virtual reality, and autonomous driving. A DTN can virtually 
map a 6G network. The virtually reflected 6G network collects the traffic informa- 
tion of the real communication network, implements data analysis to discover data 
traffic patterns, and detects abnormal occurrences in advance. The 6G network uses 
the information fed back from the virtualized network to prepare network security 
protection capabilities in advance. In addition, by collecting and analysing the com- 
munication data in the DTN, communication patterns can be determined. Then, by 
reserving communication resources, the demand and supply of data delivery services 
can be automatically achieved. 

In recent years, the urban transportation system has experienced road network 
congestion and frequent traffic accidents. DTNs leverage multidimensional informa- 
tion sensors, remote data transmission, and intelligent control technology to provide 
information assistance services for intelligent traffic management and autonomous 
vehicle driving. First, a DTN provides a virtual vision of the transportation system, 
helping to dispatch traffic and optimize public transportation services. Next, by pro- 
cessing massive amounts of real-time traffic information, the virtual system of a 
D'TN can accurately predict traffic accidents and thus help avert them. 


2.2.4 Open DTN Research Issues 


As an emerging technological paradigm, DTNs have demonstrated strong physical 
system mapping and information assistance capabilities. Both DTN operation tech- 
nologies and application scenarios have been studied, but further research questions 
remain. 

Security is one of the key research issues of DTNs. A DTN is a complex sys- 
tem composed of virtual mappings of various networks and objects. This complex 
structure makes its security difficult to protect. Moreover, information sharing within 
virtual networks can raise security concerns. In a DTN, a pair of twins has a bidi- 
rectional feedback relationship. Even if the physical system in the real world is well 
secured, an attacker can easily change the parameters of the virtual model or the data 
fed back by the virtual model. Such attacks are particularly harmful to data-sensitive 
applications such as intelligent transportation systems and medical applications. 
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DTNs rely on real-world information, whose gathering process can cause pri- 
vacy leaks. For example, in intelligent medical care, the virtual modelling of the 
human body needs to collect various types of biological information and monitor 
the patient’s daily activities. In treatment, sensitive data can be sent to and processed 
on edge servers. Edge service operators can share these data with other companies 
without user consent, which increases the risk of privacy breaches. How to balance 
data utilization and privacy protection turns out to be a critical challenge for DTN 
exploration. 

Another research issue to consider is resource scheduling. The construction of a 
DTN consumes a variety of heterogeneous resources, including sensing resources for 
information collection, communication resources for data transmission, computing 
resources for modelling processing, and cache resources for model preservation. 
These resources jointly affect the efficiency and accuracy of DTN operation. The 
way to optimize resource scheduling is worthy of future investigation. 
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Chapter 3 
Artificial Intelligence for Digital Twin 


Abstract Artificial intelligence (AI) is a promising technology that enables machines 
to learn from experience, adjust to environments, and perform humanlike tasks. 
Incorporating AI with digital twin (DT) makes DT modelling flexible and accurate, 
while improving the learning efficiency of AI agents. In this chapter, we present the 
framework of AI-empowered DT and discuss some key issues in the joint application 
of these two technologies. Then, we introduce the incorporation paradigms of three 
AI learning approaches with DT networks. 


3.1 Artificial Intelligence in Digital Twin 


AI is a branch of computer science that enables learning agents to perform tasks 
that typically rely on human intelligence. Nowadays, the blooming of AI technology 
has brought powerful capabilities in environmental cognition, knowledge learning, 
action decision, and state prediction to smart machines, vehicles, and various types 
of Internet of Things (IoT) devices. 

However, despite great advancements led by AI for industry, transportation, 
healthcare, and other areas, AI is not always glamorous. In fact, the AI learning 
process consists of continuous interactions between agents and the environmental 
system. The agents make decisions and take actions according to the current observed 
environment states, and these actions then react to and change the environment states, 
which triggers a new round of agent learning until the process finally converges. The 
interactive learning approach, which relies on real physical systems, is often costly 
and inefficient. For instance, when applying AI directly to real vehicles to train 
autonomous driving policies, vehicles can cause traffic accidents. Another example 
is leveraging AI to optimize the operation of cellular networks. Due to the large 
scale of cellular networks and their many subscribers, it takes a long time for AI 
agents to obtain feedback on state changes after performing actions, which seriously 
undermines AI learning effectiveness. Incorporating AI with simulation software 
seems a feasible approach to speed up the system feedback for AI actions. However, 
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Fig. 3.1 Al-empowered DT framework 


due to nonlinear factors and uncertainty, it is hard to build a high-fidelity simulation 
environment for a highly dynamic and complex system. Thus, the strategies and 
actions learned in a simulation environment cannot be directly deployed to machines 
in the real world. 

To cope with this problem, we resort to DT technology. DT mirrors the forms, 
states, and characteristics of physical objects in the real world with high fidelity 
and real time into virtual space. This mirror model eases our cognition of complex 
physical systems and makes operations on virtual entities equivalent to those on 
physical ones. Moreover, by leveraging the precise reflection capability of DT and the 
intelligent adaptability of AL the combination of DT and AI can benefit both parties. 
On the one hand, with the aid of DT, AI learning methods can obtain high-fidelity 
state information from physical objects for model training, verify the effect of the 
learning strategy at low cost, and implement the life cycle management of complex 
systems. On the other hand, AI learning can continuously monitor the accuracy 
of DT models, dynamically adjust the DT mapping mechanism, and maintain the 
consistency between virtual space and physical space. 

To fully explore the benefits of incorporating AI and DT, we present the frame- 
work of AI-empowered DT shown in Fig. 3.1. This framework is mainly composed 
of two types of networks, namely, physical networks and DT networks. The physical 
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networks are composed of various types of physical devices and different types of 
resources served or consumed by these devices. As ubiquitous devices in the phys- 
ical networks, sensors such as cameras and lidars collect the real-time state from 
the physical environment. The state data carry the characteristics of the real world, 
such as the operating conditions of industrial equipment and driving behaviour of 
smart vehicles, which are useful for failure detection and traffic planning. Another 
type of device involves communication infrastructures and user terminals, for in- 
stance, cellular radio base stations and mobile phones. The data interaction between 
these devices mostly adopts wireless communications, which consume spectrum re- 
sources. Thus, managing the communication equipment mainly involves scheduling 
of channel resources. Furthermore, in the physical network, smart devices play an im- 
portant role in providing computation resources. Smart devices such as autonomous 
vehicles, edge servers, and robots can be equipped with very powerful CPU and GPU 
computing capabilities, compared to handheld user devices. For such data-intensive 
and computationally intensive tasks, performing local computations on user equip- 
ment can consume excessive energy and bring about long delays. Catering to this 
problem, these tasks can be offloaded to edge service-enabled smart devices for 
efficient processing. 

The data, communication, and computing resources mentioned above can be 
scheduled to serve various types of tasks in the physical network. However, the 
highly dynamic topology of mobile devices and communication interference arising 
in the physical environment pose significant challenges to resource efficiency and 
application performance. More specifically, the mobility and dynamic topology of 
devices make environmental data collection more difficult. In addition, due to wire- 
less interference, the received data will deviate from the sender's original data, which 
can lead to erroneous environmental cognition and resource scheduling decisions. 
To address these challenges, we turn to DT technology and formulate DT networks. 

A DT network is a mapping of a physical network in a virtual space that consists 
of virtual twins of all the physical units on the physical side. Data, spectrum, and 
computing resources contained in the DT network form logical entities that can 
be freely decomposed and flexibly combined. In addition, the resources in the DT 
network include some of the knowledge and experience that have been already gained 
and cached, such as the channel history states and known bandwidth allocation 
strategy in previous radio resource management. Since the DT network operates 
in a virtual space, there is no interference or error in the information interaction 
between DT entities, and the coordination of heterogeneous resources can also break 
the constraints of node locations and realize resource supply and demand services 
between distant nodes. In addition, based on historical information and knowledge, 
future network status trends can be accurately predicted, thereby facilitating effective 
resource management. 

Based on the DT networks formulated, two promising types of applications can 
be achieved. The first is a variety of relational analysis in complex systems and 
highly dynamic environments, including objective overlap testing in distributed op- 
timization, competition for limited resources by multiple business nodes, action 
cooperation among multiple nodes, and knowledge sharing collaboration among 
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a group of machine learning agents. The other type of DT application is strategy 
testing and future state prediction. DT can provide low-cost policy verification in 
virtual space and obtain real-time result feedback. Moreover, during the forecasting 
process, the time axis can be easily and flexibly adjusted, allowing for efficient trend 
forecasting and data retrospectives. 

The AI module for scheduling resources can be divided into two parts. The 
first part involves learning schemes to determine the architecture as well as the 
components of AI models, and it can be mainly classified into deep reinforcement 
learning (DRL), federated learning (FL), and transfer learning (TL). Among these 
schemes, DRL is of an architecture combining the neural networks of deep learning 
(DL) and the decision model of reinforcement learning (RL). Based on DRL, FL is à 
multi-agent DRL framework that can protect the privacy of each agent. TL is a novel 
concept that aims to utilize the original model to construct a new model to speed up 
convergence. 

The second part of the AI module involves learning cooperation relationships that 
indicate the cooperation types between learning agents, including self-organizing, 
heterogeneous fusion, and mutual assistance. These relationships can be further clas- 
sified into individual learning and cooperative learning. Individual learning always 
converges faster than cooperative learning, since it does not experience a time delay 
in information interaction. However, a lack of global information about a system 
can cause the convergence point to be suboptimal. In contrast, collaborative learning 
can usually achieve more accurate decision performance, but it often requires longer 
convergence times, especially for large-scale complex systems. 


3.2 DRL-Empowered DT 
3.2.1 Introduction to DRL 


In earlier years, machine learning methods represented by DL and RL were widely 
used to solve various problems in networks. DL aims to construct deep neural 
networks to identify characteristics from the environment, while RL aims to take 
optimal actions to obtain maximal rewards. More specifically, DL enables machines 
to imitate human activities such as hearing and thinking and to solve complex 
pattern recognition problems, making great progress in AI-related technologies. RL 
allows agents to imitate the capacity of humans making decisions based on the 
current environment. However, both DL and RL have their drawbacks. For example, 
DL cannot explain decisions it has already made, and RL cannot identify high- 
dimensional states of the environment well. Combining DL with RL to design a 
new machine learning framework called DRL is a promising approach to address 
the above problem. DRL combines the perceptive ability of DL with the decision- 
making ability of RL. Moreover, DRL can learn control strategies directly from 
high-dimensional raw data, which is much closer to human learning compared to 
previously designed AI approaches. 
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Fig. 3.2 DRL framework 


Essentially, DRL is applied to sequential decision making, which can be mathe- 
matically formulated as a Markov decision process (MDP). The DRL framework is 
shown in Fig. 3.2. In each time slot t, the agent observes the current environment 
state s, and uses its policy to select an action a;. A policy can be considered a map- 
ping from any state to an action. After the action a; is performed, the environment 
moves to state s;4; in the next time slot with transition probability P(s;.1|s;, ar). In 
addition, a corresponding reward r; = R(s;, a+) is obtained via the immediate reward 
function, which is the evaluative feedback of the action taken. Given a stationary 
and Markovian policy x, the next state of the environment, 5;,;, is completely de- 
termined by the current state, s;. In this context, the current policy together with the 
transition probability function determines the long-term cumulative reward. Assum- 
ing T = (St, At, S141, 141,777 , ST, dT) is a trajectory from an MDP, the long-term 
cumulative reward can be defined as 


T-t 


G(T) = 9 Y RGiis arsi), G.1) 
i=0 


where y € (0, 1] is the discount factor that measures the importance of the future 
reward and T is the length of an episode. For a continuous MDP, we have T — co. 
In an MDP, the key issue is to find the optimal policy that maximizes the long-term 
cumulative reward. 


3.2.2 Incorporation of DT and DRL 


As a promising AI technology, DRL provides a feasible method for solving complex 
problems in unknown environments. However, there are still challenges to be resolved 
in the process of DRL learning and implementation, which are discussed below. 
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High cost of the trial-and-error learning process: As a zero-knowledge exper- 
imental learning method, DRL maximizes the cumulative discounted reward by 
learning optimal state-action mapping policies through trial and error. However, in 
some application scenarios, especially in traffic safety-sensitive Internet of Vehicles 
applications and smart medical care related to patients’ lives, the cost of trial and 
error is too high to be acceptable. 

Frequent data transmission in learning: A large amount of state data needs to be 
input into the DRL system to train models and draw action strategies. For example, 
the channel spectrum status and real-time communication requirements of users are 
input for radio resource scheduling. Rapid and dynamic changes in environmental 
status and user requirements result in intensive data transmission and frequent state 
updates. Furthermore, as the dimensionality of the input data increases, so too 
does the time taken for the learning process to reach the convergence. Thus, we 
find that it is difficult for the DRL method to meet the needs of delay-sensitive 
business scenarios such as the driving action control of autonomous vehicles and 
communication management in interactive multimedia applications. 

Interaction barriers between multiple agents in distributed DRL: Distributed DRL 
uses multiple agents to obtain the optimal action policy based on the environmental 
status. These agents can accelerate the learning process by sharing information 
when collaboratively working towards a common learning target. However, when 
the agents use wireless communication to share learning information, wireless signal 
fading and spectrum interference can lead to transmission errors and retransmission, 
which not only cause extra communication costs, but can also undermine training 
efficiency and learning convergence. 
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Fig. 3.3 Cooperation of DT and DRL 


To address the above challenges, we turn to DT technology. Figure 3.3 illustrates 
how DT and DRL can cooperate to improve learning efficiency. First, since DT 
creates a high-fidelity virtual map of physical objects, DRL algorithms applied in 
the real world can be trained in the DT space. Different from the real training 
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process in physical space, the trial-and-error process in DT training does not have 
unacceptable consequences, such as damage or injury to objects or humans due 
to wrong decisions. Second, the agents of DRL can obtain physical system states 
from the DT models without relying on communications between the agents and the 
physical objects, reducing data transmission delays. Compared with traditional DRL 
implemented in the physical space, the DRL model on the DT side can be trained 
for more rounds per unit time and converges faster. Finally, by modelling the DT 
of DRL agents on DT servers, the actual information interaction between agents in 
the physical space can be mapped to the information sharing between DT servers 
or within one server in virtual space. This virtual-to-virtual agent communication 
enables reliable information sharing between two agents and does not consume 
physical communication resources. 

On the DRL side, we note that the features, functions, and behaviours of physical 
objects are often high dimensional, making it difficult to describe them directly in 
the DT modelling process. With the help of DRL, these high-dimensional data are 
extracted and refined by neural networks into lower-dimensional data that are easier 
to process. Furthermore, DRL can help handle some of the unique problems of 
DT, such as DT placement and DT migration algorithms, and make DT technology 
adaptable to different time-varying environments. 

Numerous recent studies have investigated the cooperation of DRL and DT. 
Among these works, the resource management of sixth-generation (6G) networks 
has attracted much attention from researchers. In [25], the authors considered the 
dynamic topology of the edge network and proposed a DT migration scenario. They 
adopted a multi-agent DRL approach to find the optimal DT migration policy by 
considering both the latency of updating DT and the energy consumption of data 
transmission. In [26], the authors proposed an intelligent task offloading scheme 
assisted by DT. The mobile edge services, mobile users, and channel state information 
were mapped into DT to obtain real-time information on the physical objects and 
radio communication environments. Then, a reliable mobile edge server with the best 
communication link quality was selected to offload the task by training the data stored 
in the DT with the double deep-Q learning algorithm. In [27], the authors proposed 
a mobile offloading scheme in a DT edge network. The DT of the edge server maps 
the state of the edge server, and the DT of the entire mobile edge computing system 
provides training data for offloading decisions. The Lyapunov optimization method 
was leveraged to simplify the long-term migration cost constraint in a multi-objective 
dynamic optimization problem, which was then solved by actor—critic DRL. This 
solution effectively diminishes the average offloading latency, the offloading failure 
rate, and the service migration rate while saving system costs with DT assistance. 

DT technology and DRL can be seamlessly fused to achieve intelligent man- 
ufacturing. In [28], the authors proposed a DT- and RL-based production control 
method. This method replaces the existing dispatching rule in the type and instance 
phases of a micro smart factory. In this method, the RL policy network is learned 
and evaluated by coordination between DT and RL. The DT provides virtual event 
logs that include states, actions, and rewards to support learning. In [29], the authors 
proposed the automation of factory scheduling by using DT to map manufacturing 
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cells, simulate system behaviour, predict process failures, and adaptively control 
operating variables. Moreover, based on one of the cases, the authors presented the 
training results of the deep Q-learning algorithm and discussed the development 
prospects of incorporating DRL-based AI into the industrial control process. By 
applying the DRL method, process knowledge can be obtained efficiently, manufac- 
turing tasks can be arranged, and optimal actions can be determined, with strong 
control robustness. 

In addition to the above work, previous studies have applied DT and DRL to 
emerging applications. In [30], the authors analysed a multi-user offloading system 
where the quality of service is reflected through the response time of the services; 
they adopted a DRL approach to obtain the optimal offloading decision to address 
the problem of edge computing devices overloading under excessive service requests 
owing to the computational intensity of the DT-empowered Internet of Vehicles. In 
[31], the authors discussed the feedback of traditional flocking motion methods 
for unmanned aerial vehicles (UAVs) and proposed a DT-enabled DRL training 
framework to solve the problem of the sim-to-real problem restricting the application 
of DRL to the flocking motion scenario. 


3.2.3 Open Research Issues 


Although the cooperation of DRL and DT has shown great potential in some scenar- 
ios, there are still problems that warrant investigation. The first problem is resource 
scheduling. The volume of data of physical objects in DT is huge, and the deployment 
of DRL at the edge also requires computing resource services. Therefore, reducing 
redundant data and designing lightweight DRL models are significant issues in the 
combination of DT and DRL. 

Another issue is environmental dynamics. The DT modelling process can involve 
a dynamic and time-varying environment, with a wide variety of physical objects, 
and the data and computing requirements required for the corresponding modelling 
processing can also differ. In addition, the high-speed movement of physical objects 
and the dynamic changes of wireless channels will further exacerbate the uncertainty 
of environmental characteristics. Although DRL can provide an optimal strategy for 
DT resource scheduling, a continuously and dynamically changing environment can 
seriously undermine learning efficiency. Therefore, improving the flexibility and 
adaptability of DRL to dynamic DT modelling is an important issue to be addressed. 
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3.3 Federated Learning (FL) for DT 
3.3.1 Introduction to FL 


The proliferation of AI learning techniques has provided unprecedented powerful 
applications to areas including smart manufacturing, autonomous driving, and in- 
telligent healthcare. With these diverse AI applications, two critical challenges have 
emerged that must be addressed. The first challenge is learning scalability. In a system 
with many widely distributed nodes, using a traditional centralized AI mechanism in 
the learning process can generate significant amounts of data to be collected and in 
overhead transmission, creating a great burden on the processing capability of a few 
centralized agents. Another challenge centres around privacy protection. The system 
states or data resources gathered for learning related to factory production tech- 
niques, route navigation preferences, and an individual’s personal physical condition 
invariably contain sensitive information, requiring a strong privacy guarantee. 

FL has been widely regarded as an appealing approach to address the above 
challenges. FL is a privacy-protected model-training technology with an emphasis 
on leveraging distributed agents to collect data and leverage local training resources. 
Unlike centralized AI, which depends purely on the capability of a few central agents, 
in FL multiple geodistributed agents perform model training in parallel without 
sharing sensitive raw data, thus helping ensure privacy and reducing communication 
costs. 
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Fig. 3.4 Main flow of the FL process 


Figure 3.4 shows the main flow of the FL process. First, a central agent initializes 
a global model, denoted as wo, and broadcasts this model to the other distributed 
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agents. Then, after each distributed agent receives wọ, it takes locally collected data 
to update the parameters of this model and achieves a local model that minimizes 
the loss function, defined as 


F(wj) = >) ffxi) | IDil. (3.2) 


xi € Di 


where w; is the local model of agent 7 in learning iteration t, and D; is the local data 
set of agent i. This loss function is used to measure the accuracy of the local model 
and guide the model update in a gradient descent approach, which is written as 


wit! = ui - £-VF(of), (3.3) 


where £ is the learning step. Next, each distributed agent uploads its local model to 
the central agent and waits for an aggregation step, which can be written as 


N 
wit! = 2; æi ot |N, (3.4) 
j=l 


where oa; is the coefficient of agent 7 and N is the number of collaborating learning 
agents. When the aggregation is completed, the central agent will republish the 
updated global model to the distributed agents. The iterations repeat in this manner 
until the global model converges or reaches a predetermined accuracy. 


3.3.2 Incorporation of DT and FL 


Although FL is a promising paradigm that enables collaborative training and miti- 
gates privacy risks, its learning operation still has several challenges and limitations. 

Complexity and uncertainty of model characteristics: Large-scale dynamic sys- 
tems usually have diverse features that correlate with each other, which means it 
is very difficult for FL to extract them from system events. Moreover, during the 
learning operation, unplanned events such as weather changes, traffic accidents, and 
equipment failures, can further confuse the training inputs and undermine model 
convergence. 

Asynchrony between heterogeneous cooperative agents: As a distributed AI 
framework, FL leverages multiple geographically distributed agents to train their 
local models in parallel and then aggregates a parametric model in a central agent. 
There is heterogeneity in the training environment where each agent is located in 
terms of the number of physical entities, the size of the region, the frequency of 
event changes, and the differences in agents' processing capacity. This heterogene- 
ity makes it hard to synchronize the aggregation of FL across multiple distributed 
agents. Although previous works have been devoted to the design of asynchronous 
FL mechanisms, most of them have improved the learning convergence at the cost 
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of model accuracy. How to achieve both learning efficiency and model precision is 
still an open question. 

Interaction bottleneck between collaborative agents: Considering the distributed 
training and central aggregation characteristics of FL, frequent interactions are re- 
quired between the client agents and the central agent, especially for learning systems 
with high-dimensional feature parameters and highly dynamic environments. In such 
a case, where wireless communications are used to realize the interactions between 
agents, the efficiency of local model aggregation and global model distribution can 
be severely undermined due to the data transfer bottleneck caused by the limited 
wireless spectrum and disturbed 
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Fig. 3.5 Benefits of applying DT in FL 


To address the above challenges, we turn to DT technology. Figure 3.5 illustrates 
the benefits of applying DT in FL. First, reflecting complex physical entities and 
environments into DT space can eliminate unnecessary interference factors, thereby 
helping FL to mine the core features of the system and further explore their in- 
terrelationships. Second, for the problem of asynchronous heterogeneous training 
regions, using a mirrored virtual environment built by DT to replace all or part of the 
regional systems affected by slow response can greatly improve these regions’ local 
model convergence speeds. The training between regions is thus synchronized, and 
both learning efficiency and accuracy can be achieved. Finally, the DT mappings of 
multiple regions can be constructed on a single computing server, and the real data 
communications between the agents located in different regions in the physical space 
can be mapped to the interactions between multiple learning processes in the virtual 
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space. Therefore, DT can free the collaborative agents in FL from the constraints of 
physical communication resources. 

We note that the many benefits provided by DT to FL depend on the ability of the 
twin models in the virtual space being able to map physical entities and networks 
accurately and in real time. Due to the potential dynamics of physical networks, 
the DT mapping strategy needs to be adjusted accordingly. Considering the large- 
scale and distributed characteristics of the physical entities, using FL to optimize the 
mapping strategy seems an appealing approach. More specifically, in the integration 
of DT and FL, DT mapping accuracy can be included as an element of the learning 
reward, and the parameters of the DT mapping strategy can be added to the learning 
action space. 

Recently, research attempts have focused on applying DT with FL. Among these 
works, the Industrial IoT (MoT), which enables manufacturers to operate with massive 
numbers of assets and gain insights into production processes, has turned out to be an 
important application scenario. In [32], the authors intended to improve the quality 
of services of the HoT and incorporated DT into edge networks to form a DT edge 
network. In this network, FL was leveraged to construct IIoT twin models, which 
improves IIoT communication efficiency and reduces its transmission energy cost. 
In [33], the authors used DT to capture the features of IIoT devices to assist FL 
and presented a clustering-based asynchronous FL scheme that adapts to the IIoT 
heterogeneity and benefits learning accuracy and convergence. In [34], the authors 
focused on resource-constrained IIoT networks, where the energy consumption of 
FL and digital mapping become the bottleneck in network performance. To address 
this bottleneck, the authors introduced a joint training method selection and resource 
allocation algorithm that minimizes the energy cost under the constraint of the 
learning convergence rate. 

In preparation for the coming 6G era, DT technology and FL can be seamlessly 
fused to trigger advanced network scheduling strategies. In [9], the authors presented 
an FL-empowered DT 6G network that migrates real-time data processing to the 
edge plane. To further balance the learning accuracy and time cost of the proposed 
network, the authors formulated an optimization problem for edge association by 
jointly considering DT association, the training data batch size, and bandwidth 
allocation. In [35], the authors applied dynamic DT and FL to air-ground cooperative 
6G networks, where a UAV acts as the learning aggregator and the ground clients 
train the learning model according to the network features captured by DTs. 

In the area of cybersecurity, blockchain has emerged as a promising paradigm 
to prevent the tampering of data. Since both the ledger storage of blockchain and 
the model training process of FL are distributed, blockchain can be introduced 
into DT-enabled FL. In [36], the authors utilized blockchain to design a DT edge 
network that facilitates flexible and secure DT construction. In this network, a double 
auction-based FL and local model verification scheme was proposed that improves 
the network's social utility. In [37], the authors proposed a blockchain-enabled 
FL scheme to protect communication security and data privacy in digital edge 
networks, and they introduced an asynchronous learning aggregation strategy to 
manage network resources. 
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In addition to the above work, previous studies have applied DT and FL to emerg- 
ing applications. In [38], the authors used the COVID-19 pandemic as a new use 
case of these two technologies and proposed a DT-FL collaboratively empowered 
training framework that helps the temporal context capture historical infection data 
and COVID-19 response plan management. In [39], the authors applied these two 
technologies to edge computing-empowered distribution grids. A D'T-assisted re- 
source scheduling algorithm was proposed in an FL-enabled DT framework that 
outperforms benchmark schemes in terms of the cumulative iteration delay and 
energy consumption. 


3.3.3 Open Research Issues 


The incorporation of FL with DT is a promising way to improve learning efficiency 
while guaranteeing user privacy. However, there are still unexplored questions in the 
joint application of these two technologies. The first question worth investigating 
is the operation matching between DT and FL. The training process of FL requires 
many iterations, which consume massive computing resources and generate a certain 
time delay. Since DT modelling also depends on intensive computation, competition 
for resources arises between DT and FL. Effective resource scheduling is thus a 
critical research challenge. Moreover, the key advantage of DT is the ability to 
accurately map the physical world into virtual space in real time. When using FL to 
improve DT modelling accuracy, how to make the slow iterative learning direct the 
DT mapping strategy in a timely manner is still a problem for future research. 

Another unexplored question concerns privacy. To reflect physical systems and ob- 
jects fully and accurately, DT modelling inevitably needs to extract massive amounts 
of system data and user information, which can lead to privacy leakage. On the other 
hand, the use of FL is an attempt to protect users' private information. How to ensure 
privacy protection while improving the accuracy of DT modelling is also a challenge 
to be addressed. 


3.4 Transfer Learning (TL) for DT 
3.4.1 Introduction to TL 


In traditional distributed intelligence networks, multiple machine learning agents 
equipped on edge servers, smart vehicles, and even powerful IoT devices, work 
independently. In some application scenarios, multiple agents in similar environ- 
ments can learn with the same goal. If these agents start training at different times, 
agents that start later may learn their strategies from scratch. A complete training 
process always incurs a great deal of resource consumption and long training delays, 
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posing a critical challenge for resource-constrained devices serving delay-sensitive 
computing tasks. 

TL, which is a branch of AI with low learning costs and high learning efficiency, 
provides a promising approach to meet these challenges. Unlike the traditional ma- 
chine learning agent that tries to learn a new mission from scratch, a TL agent 
receives prior knowledge from other agents that have performed similar or related 
missions, and then starts learning with the aid of this knowledge, thus achieving 
faster convergence and better solutions. 
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Fig. 3.6 TL framework 


Figure 3.6 illustrates the TL framework. At the bottom of this figure are shown var- 
ious types of modelling training and strategy learning tasks generated by IoT devices. 
Multiple agents with TL capabilities are deployed to handle these tasks. We note that 
FL-inspired learning is a gradual process that consists of continuous environment 
awareness, constant action exploration, and persistent strategy improvement. As the 
learning proceeds, valuable knowledge, such as neural network parameters, state— 
action pairs, action exploration experience, and the evaluation of existing strategies, 
is generated and recorded. This knowledge not only is the basis for the learning of 
the local agent in subsequent stages, but also can be shared with other agents, which 
can then jump directly from the initial learning stage, without any experience, to an 
intermediate stage with certain prior knowledge. 

In the FL framework, a transfer controller module manages the sharing process, 
including the pairing of the transfer source and target agents, knowledge building 
and pretreatment, the knowledge data delivery, and the caching among the agents. 
It is worth noting that edge resources play a vital role in the FL framework. On the 
one hand, these resources can serve in IoT applications, such as vehicular commu- 
nications, popular video caching, and sensing image recognition, while multi-agent 
machine learning is leveraged for resource scheduling. On the other hand, we resort 
to TL to improve machine learning efficiency and reduce scheduling time costs. 
However, the knowledge sharing process can create the need for extra communi- 
cation, computing, and cache resources. Thus, there exists a trade-off in resource 
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allocation, that is, whether to use the resources to directly enhance IoT application 
performance or for learning efficiency improvement and service delay reduction. 

TL can offer many benefits in multi-agent distributed learning scenarios, the main 
advantage being the reduction in training time of the target agent of the knowledge 
sharing process. The shared prior knowledge can effectively guide the agent to 
quickly converge to and reach optimal action strategies without time-consuming 
random exploration. In addition, TL can save training resource consumption. Each 
training step requires analysis and calculation. A faster training process means fewer 
steps, as well as lower computing and energy resource consumption. Moreover, for 
machine learning approaches that record large amounts of state—action pairs, the 
reduced training process provided by the TL also reduces the record sizes, thereby 
saving on cache resources. 


3.4.2 Incorporation of DT and TL 


Despite the benefits provided by TL, unaddressed challenges remain in TL scheme 
implementation, especially in application scenarios with multiple associated hetero- 
geneous agents. Due to the associations between such agents, multiple TL node pairs 
can be formed. Thus, the first challenge is the choice of transferring source when 
the target mission has multiple potential knowledge providers. For example, when 
multiple UAVs are agents in training terrain models based on sensing data, these 
UAVs hover and cruise at different altitudes and can have overlapping or even the 
same modelling area. The beneficial prior knowledge of a UAV agent performing a 
learning mission can exist in multiple neighbouring UAVs. Source determination is a 
prerequisite before the learning implementation. However, it is difficult to determine 
the appropriate transferring pairs solely according to the physical characteristics and 
superficial associations in the physical world. Another challenge is what knowledge 
should be transferred. The prior knowledge learned by heterogeneous agents can take 
various forms and provide diverse learning gains between different transferring pairs. 
Knowledge selection and organization are the basis of effective TL. However, since 
knowledge is an abstract concept, it is hard to measure and schedule it accurately in 
physical space. 

Incorporating DT with TL is a feasible approach to address the above challenges. 
In terms of the effect of DT on TL, by leveraging the comprehensive mapping 
ability of DT from a physical system to virtual space, multi-agents’ environmental 
characteristics, neural network structure, and learning power, as well as their current 
training stages can be clearly presented in a logical form. This logical representation 
allows the TL scheduler to find optimal TL source-destination agent pairs based on 
the similarity of environmental features or the matching of knowledge supply and 
demand. Moreover, DT models existing in the virtual space are suitable for describing 
the knowledge attributes acquired by each agent. For example, knowledge can be 
logically represented as a tuple DT model composed of an owner, information items, 
the application scope, transfer gains, transfer costs, and other elements. 


38 3 Artificial Intelligence for Digital Twin 


From the perspective of the role played by TL in the DT process, especially in 
scenarios of distributed multi-DT models, TL can share the construction experience 
of the completed DT model, such as the model structure, constituent elements, and 
update cycle, with the DT models that have been or have yet to be formed. This 
knowledge transfer scheme greatly shortens DT construction delays and improves 
DT model accuracy. Moreover, since DT processes consume considerable com- 
munication and computing resources, TL can also be used in several similar DT 
environments to reuse resource scheduling strategies. 

TL has been used in many areas to improve the efficiency of distributed learning. 
For instance, in [40], the authors proposed a deep uncertainty-aware TL framework 
for COVID-19 detection that addresses the problem of the lack of medical images in 
neural network training. In [41], the authors introduced a TL-empowered aerial edge 
network that uses multi-agent machine learning to draw optimal service strategies 
while leveraging TL to share and reuse knowledge between UAVs to save on resource 
costs and reduce training latency. In [42], TL was used in action unit intensity 
estimation, where known facial features were inherited in new estimation scenarios 
at minimal extra computational cost. 

Along with the development of DT technology, a few studies have been dedicated 
to the incorporation of DT and TL. In [43], the authors focused on anomaly detec- 
tion in dynamically changing network functions virtualization environments. They 
used DT to measure a virtual instance of a physical network in capturing real-time 
anomaly-fault dependency relationships while leveraging TL to utilize the learned 
knowledge of the dependency relationships in historical periods. In [44], the authors 
introduced a DT and deep FL jointly enabled fault diagnosis scheme that diagnoses 
faults in both the development and maintenance phases. In this scheme, the previ- 
ously trained diagnosis model can be migrated from virtual space to physical space 
for real-time monitoring. Considering that DT models are usually customized for 
specific scenarios and could lack sufficient environmental adaptability, the authors 
in [45] leveraged TL to explore an adaptive evolution mechanism that improves 
remodelling efficiency under the premise of limited environmental information. 


3.4.3 Open Research Issues 


As recent emerging technologies, DT and TL, as well as their incorporation, still 
have open research issues to be explored. The first issue concerns knowledge trans- 
fer between heterogeneous training models. Training models can differ among TL 
agents, in terms of their learning methods, neural network structures, and knowl- 
edge cache organization. Although DT can describe these training models logically 
and consistently in virtual space, during TL implementation, how to preprocess and 
match the knowledge between source and target agents to improve the transfer effect 
is still a key challenge. 

The second issue involves resource scheduling in DT-empowered TL. Various 
types of resources play a key role in TL for knowledge data delivery, storage, and 
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processing, and DT’s model building and updating also consume these resources. 
Competition for constrained resources can thus take place during cooperation be- 
tween FL and DT. How to coordinate resource scheduling between the two and 
improve the efficiency of knowledge transfer while ensuring modelling accuracy is 
therefore also a key question to be addressed. 

Finally, an issue to be considered is DT construction that adapts to TL opera- 
tions. TL usually occurs between multiple agents distributed in a large-scale system, 
whereas DT systems always construct models on a small number of centralized 
servers. How to solve the contradiction between the distributed architecture of TL 
and the centralized construction of DT requires further exploration. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


® 


Check for 
updates 


Chapter 4 
Edge Computing for Digital Twin 


Abstract Mobile edge computing is a promising solution for analysing and process- 
ing a portion of data using the computing, storage, and network resources distributed 
on the paths between data sources and a cloud computing centre. Mobile edge com- 
puting thus provides high efficiency, low latency, and privacy protection to sustain 
digital twin. In this chapter, we first introduce a hierarchical architecture of digital 
twin edge networks that consists of a virtual plane and a user/physical plane. We then 
introduce the key communication and computation technologies in the digital twin 
edge networks and present two typical cooperative computation modes. Moreover, 
we present the role of artificial intelligence (AI) for digital twin edge networks, and 
discuss the unique edge association problem. 


4.1 Digital Twin Edge Networks 
4.1.1 Digital Twin Edge Network Architecture 


In traditional cloud computing-assisted digital twin modelling, the centralized server 
collects data and constructs twin mappings of the physical components, which leads 
to large communication loads. In this context, digital twin edge networks, a new 
paradigm that integrates mobile edge computing (MEC) and digital twin to build 
digital twin models at the network edge, has emerged as a crucial area. In digital 
twin edge networks, the edge nodes—for example, base stations (BSs) and access 
points—can collect running states of physical components and develop their be- 
haviour model along with the dynamic environment. Furthermore, the edge nodes 
continuously interact with the physical components by monitoring their states, to 
maintain consistency with their twin mappings. Hence, the networking schemes (i.e. 
decision making, prediction, scheduling, etc.) can be directly designed and opti- 
mized in the constructed digital twin edge networks, which improves the efficiency 
of networking schemes and reduces costs. To better understand the internal logic 
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of digital twin edge networks, we first present a hierarchical architecture of these 
networks that consists of a virtual plane and a user/physical plane, as shown in Fig. 
4.1. 
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Fig. 4.1 A hierarchical architecture of digital twin edge networks 


The user/physical plane is distinguished by typical digital twin application sce- 
narios, such as an intelligent transportation system, the Industrial Internet of Things 
(IoT), and sixth-generation (6G) networks. The virtual plane generates and main- 
tains the virtual twins of physical objects by utilizing digital twin technology at the 
edge and on cloud servers. Specifically, devices in the user/physical plane include 
vehicles, sensors, smart terminals, and so forth. These devices need to synchro- 
nize their data with the corresponding virtual twins in real time through wireless 
communication technologies. Meanwhile, these devices also accept feedback from 
their virtual twins for instantaneous control and calibration. Therefore, mobile edge 
networks are expected to provide communications and computations that satisfy 
the main requirements of low latency, high reliability, high speed, and privacy and 
security preservation, to support real-time interactions between physical and virtual 
planes. 
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4.1.1.1 Communications 


Communications between physical and virtual planes in typical digital twin edge 
network scenarios can be summarized as follows. 


* Intelligent transportation systems: In recent years, urban transportation systems 
have faced such problems as traffic jams and traffic accidents. Digital twin edge 
networks can provide a virtual vision of the transportation system that can help 
to manage traffic and optimize public transportation service planning efficiency 
[46]. For example, traffic accidents can be effectively predicted and avoided by 
processing the massive amounts of real-time transportation information in the 
virtual plane [47]. Digital twin edge networks can also offer new opportunities 
for maintaining transportation facilities. By simulating the usage of transportation 
facilities in the virtual plane, facility malfunctions can be predicted in advance, 
which helps managers to schedule appropriate maintenance actions. 
Vehicle-to-everything communications allow vehicles to communicate with other 
vehicles and their virtual twins via wireless links, which can be realized by dedi- 
cated short-range communications and fifth-generation/6G communications [48]. 
In digital twin edge network-enabled intelligent transportation systems, vehicles’ 
running states and perceived environmental information need to be transmitted 
to the virtual plane to update the virtual twins. However, it is challenging to 
guarantee strict data transmission delays, since vehicles move at high speeds. A 
detailed communications design must be carefully considered for physical plane 
and virtual plane interactions in such a dynamic network environment. 

* Internet of Things (IoT): With the increasing scale of the IoT, digital twin is one 
of the most promising technologies enabling physical components be connected 
with their virtual twins in digital space by using different sensing, communication, 
computing, and software analytics technologies, to provide configuration, mon- 
itoring, diagnostics, and prognostics for maintaining physical systems [49, 50]. 
For example, in manufacturing, digital twin edge networks can be utilized for 
different aspects of manufacturing to improve production efficiency and reduce 
product life cycles [51, 52]. When designing parts, their full life cycle can be 
simulated through a virtual model, and design defects can be found in advance 
to realize accurate parts design. In factory production lines, through a virtual 
model of the entire production line, the production process can be simulated in 
advance and problems in the process found, to achieve more efficient production 
line management and process optimization. Additionally, in the health domain, 
digital twin edge networks can be utilized to establish twin patients. The twin 
patients can collect patients’ physiological status and life style, medication input 
data, and data about the patients’ emotional changes over time. Thus, twin patients 
can enable medical experts to provide patients with a full range of medical care 
and even accurately predict changes. 

Machine-to-machine and device-to-device (D2D) communications are enabling 
technologies for the digital twin edge network-empowered IoT [53]. Physical com- 
ponents can form clusters and transmit shared status data to the corresponding 
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virtual twins by reusing the unoccupied uplink spectrum resources. Machine-to- 
machine and D2D communications can improve data transmission rates during 
physical plane and virtual plane interactions. However, privacy and security pro- 
tection of the information in virtual twin formation is a critical issue, since some 
core data, such as users’ personal information, must be continuously updated 
for the virtual twins, and malicious attackers could intercept this information 
through wireless communications. Hence, privacy and security protection mech- 
anisms need to be designed for physical and virtual plane interactions in the 
IoT. 

e 6G networks: 6G networks aim to realize ultra-high-capacity and ultra-short- 

distance communications, go beyond best effort and high-precision communica- 
tions, and converge multiple types of communications [27]. Thus, 6G networks 
can face challenges in security, spectral efficiency, intelligence, energy efficiency, 
and affordability. The emergence of digital twin edge networks introduces op- 
portunities to overcome these challenges. Digital twin edge networks provide 
corresponding virtual twins of 6G network components, which can collect traf- 
fic information on the entire network and use data analysis methods to discover 
network traffic patterns and detect abnormal traffic in advance. 6G networks use 
the information fed back from the virtual twins to make preparations in advance 
to improve network performance. In addition, by collecting and analysing the 
communication data in networks, rules of communication can be discovered to 
automate demand and provide services on demand. Since communication demand 
can be predicted in advance, the information can be fed back to the 6G networks 
to reserve resources, such as spectrum resources. 
The interactions between the physical and virtual planes in digital twin- 
empowered 6G networks demand high data rates. Small cell communication is an 
efficient solution for improving spectral efficiency by deploying heterogeneous 
infrastructures, such as pico and micro BSs [54]. In small cell communication, 
all BSs are equipped with rich computational resources and are responsible for 
generating and maintaining the virtual twins of physical objects in the cells. 
Additionally, intelligent communication infrastructures, such as reconfigurable 
intelligent surfaces [55] and unmanned aerial vehicles [56, 57], can be leveraged 
to realize interactions between the physical and virtual planes. 


4.1.1.2 Computations for Resource-Intensive Tasks in the Virtual Plane 


Beyond communications with low latency and high reliability, the resource-intensive 
tasks executed by digital twin edge networks require large amounts of computational 
resources. The virtual plane in the hierarchical architecture of digital twin edge 
networks consists of multiple distributed edge servers and central cloud servers. 
Specifically, central cloud servers have strong processing, caching, and computing 
capabilities. Resource-intensive tasks that focus on computation speed and central- 
ized processing can be deployed on central cloud servers. Through cloud servers, 
large amounts of data can be processed in a short time (a few seconds), to provide 
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powerful digital twin services for physical objects. In addition, the cloud architecture 
facilitates the organization and management of large numbers of connected physical 
objects and virtual twins, as well as the combination and integration of real-time 
data and historical experience. 

In addition, edge servers have computing (i.e. CPU cycles) and caching resources 
distributed on the paths between data sources and the cloud computing centre that can 
analyse and process a portion of the data from both physical objects and virtual twins. 
Edge servers can be deployed in the network infrastructure, such as at BSs, roadside 
units, wireless access points, gateways, and routers, or they can be mobile phones, 
vehicles, and other devices with the necessary processing power and computing and 
storage capabilities. Considering the proximity of edge servers to physical objects, 
delay-sensitive tasks can be deployed on edge servers to provide digital twin services 
for users with high efficiency. 
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Fig. 4.2 Cooperative edge computing in digital twin edge networks 


4.1.2 Computation Offloading in Digital Twin Edge Networks 


In digital twin edge network scenarios, data processing and analysis require great 
amounts of computing resources. Nevertheless, most criteria cannot be met by edge 
computing, due to the limited capacity of edge servers. For example, when an edge 
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node has many computing tasks with a long task queue, it can easily create high 
latency. Cooperative computation is an approach for offloading computing tasks to 
other nodes that have free computing resources, to reduce task processing latency. 
According to different cooperation methods among the nodes, the following two 
cooperative computation modes can be used. 


4.1.2.1 Cooperative Edge Computing 


In cooperative edge computing, as shown in Fig. 4.2, if other edge nodes have free 
computing resources, they should share in the computing tasks of the overloaded 
edge nodes. It is very important for multiple edge nodes to maintain workload 
balance and provide low-latency computing services, particularly when a digital 
twin edge network provides services for time-sensitive scenarios, as in intelligence 
transportation systems. 


Fig. 4.3 Cooperative cloud-edge-end computing in digital twin edge networks 


4.1.2.2 Cooperative Cloud-Edge-End Computing 


As shown in Fig. 4.3, cooperative cloud-edge-end computing is necessary to meet the 
demand for large-scale computations and AI for real-time modelling and simulation 
in digital twin edge networks. Edge servers process the data that need to be responded 
to in real time. The cloud server provides strong computing power and the integration 
of various types of information. The interaction between edge nodes and the cloud 
in real time can solve the problem of data heterogeneity for the cloud. Cooperative 
cloud-edge-end computing can provide low-latency computation, communications, 
and virtual twin continuous updating for digital twin edge networks. In addition, 
when the storage resources of the edge nodes are insufficient, the cloud can store 
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part of the data and transmit them to the client through the network when needed, 
which saves storage resources on the edge servers. 

In this section, we first present a hierarchical architecture of digital twin edge 
networks that consists of a virtual plane and a user/physical plane. Then, we illustrate 
key communications and computation technologies between the physical and virtual 
planes in the typical digital twin edge network scenarios and present two cooperative 
computation modes. 


4.2 Al for Digital Twin Edge Networks 


The integration of digital twin with AI [58] opens up new possibilities for efficient 
data processing in applying digital twins in 6G networks. MEC, one of the key 
enabling technologies for 6G, can considerably reduce system latency by executing 
computations based on AI algorithms at the edge of the network. Al-empowered 
MEC has been widely investigated for accomplishing edge intelligence tasks such 
as computation offloading, content caching, and data sharing. In [59], the authors 
proposed an Al-empowered MEC scheme in the IIoT framework. In [60], the au- 
thors proposed an intelligent content caching scheme based on deep reinforcement 
learning (DRL) for an edge computing framework. AI can significantly improve the 
construction efficiency and optimize the running performance of digital twin edge 
networks. The system model, communication model, and computation model of 
Al]-empowered digital twin edge networks are as follows. 


4.2.1 System Model 


4.2.1.1 AI-Empowered Network Model 


We consider the Al-empowered digital twin edge network shown in Fig. 4.4. Our 
wireless digital twin network system comprises three layers: a radio access layer 
(i.e. end layer), a digital twin layer (i.e. edge layer), and a cloud layer. The radio 
access layer consists of entities such as mobile devices and vehicles that have limited 
computing and storage resources. Through wireless communications, these entities 
connect to BSs and request services provided by network operators. In the digital 
twin layer, some BSs are equipped with MEC servers to execute computation tasks, 
while other BSs provide wireless communication services to end users. The digital 
twins of the physical entities are modelled and maintained by the MEC servers. Since 
the number of entities in the physical layer is much larger than the number of MEC 
servers in the digital twin layer, an MEC server can maintain multiple digital twins 
of physical entities. In the cloud layer, cloud servers are equipped with large amounts 
of computing and storage resources. Tasks that are computation sensitive or require 
global analysis can be executed in the cloud layer. 
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Fig. 4.4 The architecture of wireless digital twin networks 


Since digital twins reproduce the running of physical entities, maintaining the 
digital twins of massive devices consumes a large number of resources, including 
computing resources, communication resources, and storage resources. To relieve 
the resource limitation in the edge layer, we model digital twins as one of two types: a 
device digital twin or a service digital twin. The device digital twin is a full replica of 
the physical devices, which includes the information of the hardware configuration, 
the historical running data, and real-time states. The device digital twin for user u; 
can be expressed as 


DT! (ui) = (Di, Si(£), Mi, AS; (t + 1)), (4.1) 


where D; is the historical data of user device i, such as the configuration data 
and historical running data. The term S;(f) represents the running state of device 
i, which consists of rı dimensions and varies with time, and it can be denoted as 
S(t) = {s} (t), s2(r), ... ,'(t)}. The term M; is the behaviour model set of u;, which 
consists of r2 behaviour dimensions, and M; = {m}, m?, T m? ), and AS;(t + 1) is 
the state update of S;(t) in time slot ¢ + 1. Taking a meteorological IoT device as 
an example, S(t) can be the temperature, humidity, wind speed, location, and so on. 
The behaviour models M; can consist of the variation models of the temperature, 
humidity, and wind speed. In this paper, we mainly focus on the scenarios of device 
digital twin to conduct our study. 

Different from a device digital twin, a service digital twin is a lightweight digital 
replica constructed by extracting the running states of several devices for a specific 
application. Similar to (4.1), the service digital twin can be expressed as 


DT (uj, £) = G(Di(£), S? (1), ME, AS? (t + 1)), (4.2) 


l 
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where £ is the target service, and D;(¢), Sf (r), MŽ, and AS$ (t + 1) are the cor- 
responding terms related to the target service ¢. For example, vehicles driving in 
the same region can be modelled into a specific service digital twin for supporting 
autonomous driving on a particular stretch of road. In such a case, the service digital 
twin for autonomous driving collects only the driving information of these vehicles 
and analyses their driving behaviour to guide them. Depending on the required scale, 


service digital twins can be constructed on the edge server or the cloud server. 


4.2.2 Communication and Computation Model 


The communication between end users and edge servers contains the uplink com- 
munication for transmitting data from user devices to edge servers and the downlink 
communication for sending the results from edge servers back to user devices. Note 
that the size of the results returning to users is much smaller than that of the up- 
dated data, so we consider only uplink communication latency in our communication 
model. The maximum achievable uplink data rate r;; between user i and BS j is 
given as 
adis 

rij = Wlog(1 + WN” (4.3) 
where h;; denotes the channel power gain of user i, p;; denotes the corresponding 
transmission power for user i, No is the noise power spectral density, and W is the 
channel bandwidth. The transmission latency for uploading D; from user i to BS j 


can be expressed as 
: Di 
re = ES (4.4) 
J rij 
The wired transmission latency between BSs is highly correlated to the transmis- 
sion distance. Let ¢ be the latency required for transmitting one unit of data in each 


unit distance. Then the wired transmission latency can be written as 
TB = 0+ Dj: dGv j2), (4.5) 


where D; is the size of the transmitted data and d( j1, j2) is the distance between 
BSs jı and jo. 

We denote the total computation resource of edge server j as F;. The computation 
resource of edge server j can be allocated to multiple user devices to maintain their 
digital twins on server j. Let f;; denote the computation resource assigned to the 
digital twin of user i. Then the time to execute tasks from user 7 can be expressed as 


cmp Di 
TP LL (4.6) 
fij 


where D; is the size of computation task from user i, DAN Xij fij < Fj, and xi; = 1 
if f;; > 0. Otherwise, xj; = 0. 
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4.2.3 Latency Model 


Digital twin computation 


Fig. 4.5 The digital twin construction process 


The latency of maintaining a digital twin mainly consists of two parts: the con- 
struction delay and the synchronization delay. Figure 4.5 shows the complete process 
for constructing a digital twin of user u;. In the beginning, the running data D; of uj 
are transmitted to their nearby BS through wireless communication. Then the nearby 
BS transmits through wired communication the running data D; to the digital twin 
server DT| that is responsible for constructing and maintaining the digital twin of u;. 
The digital twin server DT; runs the computation to process and analyse the received 
data and builds a digital twin model for user u;, as expressed by Eq. (4.1). During 
the digital twin computation process, AI-related algorithms are used to extract the 
data features and to train the digital twin model. Finally, the results of the digital 
twin model are transmitted back to user u; through wired and wireless communica- 
tions. The feedback results provide u; with insights for improving its service quality 
or running efficiency for specific applications. The system latency consists of the 
following items. 


1. Wireless data transmission: In the construction phase of DT(u;), the historical 
running data of user 7 must be transmitted to its digital twin server through 
its nearby BS. Let D; denote the size of the historical data to be transmitted. 
The wireless communication latency Tre from user i to its BS j can then be 
calculated according to Eq. (4.4). 

2. Wired data transmission: The wired transmission time from the nearby BS of u; 
to its digital twin server k is 


Tee" -$-Di-d(j, E). (4.7) 
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The total communication time for transmitting the historical data of u; to its 
digital twin server is thus 


com __ com com 
ik cd Pie (4.8) 
3. Digital twin data computation: The computation time at digital twin server k is 


: D; 
rome = At (4.9) 
ik f 


The total latency for constructing the digital twin of user i is 


ini _ pcom com cmp 
Tik “Ii tT tI > (4.10) 

The digital twin of user i, that is, DT (u;), is constructed on its digital twin server 
DTy. Then, DT(u;) must constantly interact with u; to remain consistent with the 
running states of u;. We denote the size of the updated data as AD;. The latency for 
one update can then be expressed as 


up AD; š AD; 
pupa = pb cADsd(Lk)-—. (4.11) 


lij ij 


The synchronization latency in one unit time slot can be written as 
syn 1 upd 
qp Se (4.12) 


where Ar denotes the time gap between every two updates. 


4.3 Edge Association for Digital Twin Edge Networks 
4.3.1 System Model 


Due to the dynamic computing and communication resources available through edge 
servers, the association of digital twins to corresponding servers is a fundamental 
problem in digital twin edge networks that needs to be comprehensively explored. 
Moreover, since the federated learning in digital twin edge networks requires multiple 
communications for data exchange, the limited communication resources need to be 
optimally allocated to improve the efficiency of digital twins in the associated edge 
servers. Thus, in this section, we design a digital twin wireless network (DTWN) 
model and define the edge association problem for digital twin networks. A permis- 
sioned blockchain-empowered federated learning framework for edge association is 
also proposed. 

We consider a blockchain- and federated learning-empowered digital twin net- 
work model as depicted in Fig. 4.6. The system consists of N end users, such as 
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Fig. 4.6 The proposed digital twin wireless network 


IoT devices and mobile devices, M BSs, and a macro BS (MBS). The BSs and the 
MBS are equipped with MEC servers. The end devices generate running data and 
synchronize their data with the corresponding digital twins that run on the BSs. We 
use D; = {(xi1, yii)... (Xip;. Yip; )} to denote the data of end user i, where D; is 
the data size, x; is the data collected by end users, and y; is the label of x;. The 
digital twin of end user i in the BSs are denoted as DT;, which is composed of the 
behaviour model M,, static running data D;, and the real-time dynamic state sz, 
so that DT; = (M;i, Di, st), whereD; and s, are the essential data required to run 
the digital twin applications. Instead of synchronizing all the raw data to the digital 
twins, which incurs a huge communication load and the risk of data leakage, we 
use federated learning to learn model M from the user data. In various application 
scenarios, the end users can communicate with other end users to exchange running 
information and share data, through, for example, D2D communications. Thus, the 
digital twins also form a network based on the connections of end users. Based on 
the constructed DTWN, we can obtain the running states of the physical devices and 
make further decisions to optimize and drive the running of the devices by directly 
analysing the digital twins. 

In our proposed digital twin network model, we use federated learning to execute 
the training and learning process collaboratively for edge intelligence. Moreover, 
since the end users lack mutual trust and the digital twins consist of private data, 
we use permissioned blockchain to enhance the system security and data privacy. 
The permissioned blockchain records the data from digital twins and manages the 
participating users through permission control. The blockchain is maintained by the 
BSs, which are also the clients of the federated learning model. The MBS runs as 
the server for the federated learning model. In each iteration of federated learning, 
the MBS distributes the machine learning model parameters to the BSs for training. 
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The BSs train the model based on the data from the digital twins and returns the 
model parameters to the MBS. 

We use orthogonal frequency division multiple access for wireless transmission 
in our system. To upload trained local models, all the BSs share C subchannels to 
transmit their parameters. The achievable uplink data rate from BS 7 to the MBS is 


U Y UI (1 PU RET, Tim ) (4 13) 
Ri = TicW~ loga(1 + >... PU GU p84 Ne” : 
= Djen ware Tim t No 


where C is the total number of subchannels, 7; .. is the time fraction allocated to BS 
i on subchannel c, and W is the bandwidth of each subchannel, which is a constant 
value. The transmission power is p. and the uplink channel gain on subchannel c 


is IE A ps ^ is the path loss fading of the channel between BS 7 and the MBS; ri,m 
is the distance between BS i and is^ MBS; a is the path loss exponent; No is the 
noise power; and >}; jc y; pU JN Le im ^ is the interference caused by other BSs using 


the same subchannel. In the download phase, the MBS broadcasts the global model 
with the rate 


RP = X wPloe E eed (4.14) 
D = 0g2(1 + —— —ÀBp p ——z—-—) 
8 Sew PP APTE + No 
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where P. is the downlink power of BS i, h? c is the channel gain between BS i and 


the MBS, and Bjen” P? chA? Fclim 18 the downlink inference. 


4.3.2 Edge Association: Definition and Problem Formulation 


The end devices or users are mapped to the digital twins in the BSs in the DTWN. 
The maintenance of digital twins consumes a large amount of computing and com- 
munication resources for synchronizing real-time data and building corresponding 
models. However, the computation and communication resources in wireless net- 
works are very limited and should be optimally used to improve resource utility. 
Thus, the association of various loT devices with different BSs according to their 
computation capabilities and states of the communication channel is a key problem 
in DTWNs. As depicted in Fig. 4.6, the digital twins of IoT devices are constructed 
and maintained by their associated BSs. The training data and the computation tasks 
for training are distributed to various BSs based on the association between the 
digital twins and the 


Definition (Edge Association) Consider a DTWN with N IoT users and M BSs. For 
any user u;, i € N, the goal of edge association is to choose the target BS j € M 
to construct the digital twin DT; of user i. The association (DT;, BS;) is denoted as 
@(i, j). If DT; is associated with BS j, then ®(i, j) = Dj, where Dj is the size of 
the data used to construct DT;. Otherwise, ®(i, j) = 0 
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A BS can be associated with multiple digital twins, whereas a digital twin can 
only be associated with at most one BS; that is, "i @(i, j) = Di. We perform edge 
association according to the datasets D; of IoT users, the computation capability of 
the BSs f;, and the transmission rate R;,; between u; and BS, denoted as 


O(i, j) = f (Di, fj, Ri.). (4.15) 


The objective of the edge association problem is to improve the utility of resources 
and the efficiency of running digital twins in the DTWN. 

We use the weight matrix A = [a;,] to represent the association relations between 
the user devices and the digital twin servers, where a;x = 1 if the digital twin of user 
i is maintained by digital twin server k. Otherwise, aj; = 0. For example, in Fig. 4.5, 
since the digital twin of u; is maintained by DT|, we have a;; = 1 and aj» = 0. The 
association matrix takes the form 
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Now we start to derive the formulation of the edge association problem. We 
consider that the gradient V f (w) of f(w) is L-Lipschitz smooth; that is, 


IIV f wi) = Vfw € Llwia = will; (4.16) 


where L is a positive constant and ||w;4.; — w;|| is the norm of w;4; — w;. We consider 
that the loss function f(w) is strongly convex; that is, 


1 
fria) z fQvi) + VL (we), wia = We) + 5llwia — w;lP. (4.17) 


Many loss functions for federated learning can satisfy the above assumptions, for 
example, logic loss functions. If (4.16) and (4.17) are satisfied, the upper bound of 
the global iterations can be obtained as 


Olog(1/01)) 


7 (01,065) = 10G 


, (4.18) 


where 6,7 is the local accuracy -—— < OL, 0g is the global accuracy, and 


0 € 0r,0G < 1. As in [94], we consider 8z a fixed value, so that the upper bound 
JT (05, 0G) can be simplified to 7 (6G) = ETT If we denote the time of one local 
training iteration by Temp, then the computation time in one global iteration is 
log(1/@)Temp, and the upper bound of total learning time is 7 (0G)Taiop. 


The time cost in our proposed scheme mainly consists of the following. 
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l. 


Local training on digital twins: The time cost for the local training of BS i is 
determined by the computing capability and the data size of its digital twins. The 
time cost is " 

mac un i bjDpr, 

T = coge do (4.19) 
where fC is the number of CPU cycles required to train one sample of data, Je is 
the CPU frequency of BS i, and b; is the training batch size of digital twin DT}. 
Model aggregation on the BSs: The BSs aggregate their local models from various 
digital twins. The computing time for local aggregation is 


X bey 
h - Fe (4.20) 


where |w ;| is the size of the local models and Los is the number of CPU cycles 
required to aggregate one unit of data. Since all the clients share the same global 


model, |wi| = |w2| = ... = |w;j| = [wg]. Thus the time cost for local aggregation 
is 
a Kilwgl 
p = ro fE: (4.21) 


Transmission of the model parameters: The local models aggregated by BS i are 
then broadcast to other BSs as transactions. The time cost is related to the number 
of blockchain nodes and the transmission efficiency. Since other BSs also help to 
transmit the transaction in the broadcast process, the time function is related to 
log2M, where M is the size of the BS network. The required time cost is 


Ki|wg| 


U 3 
R; 


TP’ = £log;M (4.22) 


where £ is a factor of the transmission time cost that can be obtained from the 
historical running records of the transmission process. 
Block validation: The block producer BS collects the transactions and packs them 
into a block. The block is then broadcast to other producer BSs and validated by 
them. Thus, the time cost is 

SB Saf" 


TY = élogo M, — + max 7 
bp > p RP i fis 


(4.23) 


where M, is the number of block producers and Sg is the size of a block. 


Note that, in the aggregation phase, the size of the model parameters |w g| is small 


and the computing capability f; is high. Thus, compared to other phases, the time for 
aggregation is very short, such that it can be neglected. Based on the above analysis, 
the time cost for one iteration is denoted as 
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In the 6G network, the growth of the user scale, the ultra-low latency requirement 
of communication, and the dynamic network status make the reduction of the time 
cost of model training an important issue in various applications. Since accuracy 
and latency are the two main metrics for evaluating the decision-making abilities of 
digital twins in our proposed scheme, we consider the edge association problem to 
find the trade-off between learning accuracy and the time cost of the learning process. 
Due to the dynamic computing and communication capabilities of various BSs, the 
edge association of digital twins—that is, how to allocate the digital twins of different 
end users to various BSs for training—is a key issue to be solved to minimize the 
total time cost. Moreover, increasing the training batchsize bn of each digital twin 
DT, can improve the learning accuracy. However, this will also increase the learning 
time cost to execute more computations. In addition, how to allocate the bandwidth 
resources to improve communication efficiency should be considered. In our edge 
association problem, we should carefully design these policies to minimize the total 
time cost of the proposed scheme. Thus, we formulate the optimization problem as 
the minimization of the time cost of federated learning for a given learning accuracy. 
To solve the problem, the association of digital twins, the batchsize of their training 
data, and the bandwidth allocation should be jointly considered according to the 
computing capability fE and the channel state h;,.. The optimization problem can 
be formulated as 


1 
i 4.2 
hoes 1- 8g G2) 
s.t. OG 2 Orn, 0G, Oin € (0,1), (4.25a) 
M 
> K; 2 D,K; € N, (4.25b) 
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M 
p tie € l,c €C, (4.25c) 
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Constraint (4.25b) ensures that the sum of the number of associated digital twins 
does not exceed the size of the total dataset. Constraint (4.25c) guarantees that each 
subchannel can only be allocated to at most one BS. Constraint (4.25d) ensures the 
range of the training batchsize for each digital twin. Problem (4.25) is acombinational 
problem. Since there are several products of variables in the objective function and 
the time cost of each BS is also affected by the resource states of other BSs, problem 
(4.25) is challenging to solve. 
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4.3.3 Multi-Agent DRL for Edge Association 


Since the system states are only determined by network states in the current iteration 
and the allocation policies in the last iteration, we regard the problem as a Markov 
decision process and use a multi-agent DRL-based algorithm to solve it. 

The proposed multi-agent reinforcement learning framework is depicted in Fig. 
4.7. In our proposed system, each BS is regarded as a DRL agent. The environment 
consists of BSs and the digital twins of the end users. Our multi-agent DRL frame- 
work consists of multiple agents, a common environment, the system state S, the 
action A, and the reward function R, which are described below. 
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Fig. 4.7 Multi-agent DRL for edge association 


* State space: The state of the environment is composed of the computing capa- 
bilities f© of the BSs, the number of digital twins K; on each BS i, the training 
data size of each digital twin D,, and the channel state h; <. The states the of 
multiple agents are denoted as s(t) = (f €, K, D, h), where each dimension is a 
state vector that contains the states for all the agents. 

e Action space: The actions of BS i in our system consist of the digital twin 
allocation K;, the training data batchsizes for its digital twins b;, and the bandwidth 
allocation t;. Thus, the actions are denoted as a;(t) = (Ki, bj, Ti). BS agent i 
makes new action decisions a;(t) at the beginning of iteration t based on system 
state s(t). The system action is a(t) = (a1, ..., aj, ..., Am). 

* Reward: We define the reward function of BS i according to its time cost 7; based 
on Eq. (4.24): 

Ri(s(t), ai(t)) = -Ti(0). (4.26) 


The reward vector of all the agents is R = (R1,...,Rm). According to Eq. 
(4.25), the total time cost T is decided by the maximum time cost of the agents 
max(T71, 75, ..., Tm}. Each DRL agent in our scheme thus shares the same reward 
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function. In the training process, the BS agents adjust their actions to maximize 
the reward function, that is, to minimize the system time cost in each iteration. 


The learning process of BS i is to find the best policy that maps its states to its 
actions, denoted as a; = 7;(s), where a; is the action to be taken by BS i for the 
whole system state s. The objective is to maximize the expected reward, that is, 


R, = X yRi(s(t), ai), (4.27) 


where y is the discount rate, 0 € y < 1. In the conventional DRL framework, it 
is hard for an agent to obtain the states of others. In our DTWN, the states of the 
digital twins and BSs are recorded in the blockchain. A BS can retrieve records from 
the blockchain to obtain the system states and actions of other agents in the training 
process. We use z = [71, 72, ..., Zn] to denote the policies of the n agents, whose 
parameters are denoted as 0 = [81, 82, ..., 8n]. Thus we have the following policy 
gradient for agent i: 


Va: J (mi) = Etojy,a-D[Ve;n(ailoi): 


z (4.28) 
Va,Q; ({9;}, Als. An) la;72;(0;)]: 


where {6;} is the observation of agent i, that is, the state of each agent. In our scheme, 
since the placement of digital twins requires global coordination, we consider that 
all the agents share the same system state through information exchange between 
the servers. Agent 7 determines its action a; through its actor deep neural network 
(DNN) z(s,|05), denoted as 


a; (t) = ni(s;|05,) +N, (4.29) 


where 9t is the random noise for generating a new action. The actor DNN is trained 
as 


05 = 05 tO: ElV (St, a1, < 4/99) laj=n(s¢| 0) i Va, zt(si)], (4.30) 


where a; is the learning rate of the actor DNN. 
The critic DNN of agent i is trained as 


6o, = 6o, + ao, -E[2(yr - Q(se.all69,)  VQ(sis ai... a)], — (30) 


where ag, is the learning rate, y; is the target value, and (a1, ..., a;) constitutes the 
actions of the agents in our system. 

Inthe proposed algorithm, all the actor networks and critic networks are initialized 
randomly as the initial training parameters. Then the replay memory is initialized 
to store the experiential samples in the training process. In each episode, the agent 
selects its action towards its current observation state and obtains the reward for 
its current action. Then the new observation of the system state is obtained. The 
experience tuple (5;,aj,r;,5;41) is then stored in the replay buffer. Finally, the 
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agents train their critic network and actor network by sampling records from the 
replay buffer. 
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Chapter 5 
Blockchain for Digital Twin 


Abstract Security and privacy are critical issues in digital twin edge networks. 
Blockchain, as a tamper-proof distributed database, is a promising solution for pro- 
tecting data and edge resource sharing in digital twin edge networks. In this chapter, 
we first introduce the architecture of the blockchain-empowered digital twin edge 
network, to show the integration angles of the blockchain and digital twin tech- 
niques. Then, we show the block generation and consensus process in the developed 
blockchain-empowered digital twin edge network architecture. 


5.1 Blockchain-Empowered Digital Twin 


Blockchain is a chain structure of data blocks arranged in chronological order that is 
essentially a tamper-proof distributed database that uses cryptography to ensure the 
security of each transaction in a decentralized manner. A blockchain is composed 
of peer-to-peer networks, distributed storage, consensus mechanisms, cryptography, 
and smart contracts. Therefore, a blockchain has the advantages of decentralization, 
tamper resistance, anonymity, public verifiability, and traceability [61, 62]. Inte- 
grating blockchain and digital twin provides security guarantee, trusted traceability, 
accessibility, and immutability of transactions in digital twin edge networks. Specifi- 
cally, building virtual twins and continuously updating twin models require core data 
that contain private user information, such as production parameters and users’ per- 
sonal information in industrial manufacturing and personal health data in healthcare. 
Therefore, the physical and virtual synchronization during virtual twin construction 
and maintenance needs to be recorded as transactions in the blockchain. This means 
the core data and edge resources can be governed by the blockchain in a decentralized 
and secure manner. In addition, virtual twins can simulate the behavioural features 
of physical components and generate virtual data. After the blockchain stores the 
generated virtual data in its distributed ledger, these virtual data become digital 
assets and their ownership can be proved. To achieve a secure and reliable digital 
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twin edge network, the following blockchain-related security performances should 
be satisfied. 


* Security and trust: In the developed blockchain-empowered digital twin edge 
network, the transactions are audited and verified by a set of verifiers by utilizing 
consensus algorithms [63], unlike traditional transaction management, which 
depends on a central infrastructure. Thus, the blockchain can guarantee security 
and trust for the transactions in a digital twin edge network in a decentralized 
manner without a trusted intermediary. 

* Unforgeability and immutability: The decentralized authentication of transactions 
in the blockchain-empowered digital twin edge network ensures that no attacker 
can pose as a user to corrupt the blockchain. In addition, verifiers who execute the 
consensus algorithms are reluctant to misbehave or collude with each other, since 
all the verifiers’ identities are revealed to the users in a digital twin edge network 
and would be scrutinized for any misconduct. Furthermore, an attacker cannot 
modify the audited and stored transactions in the blockchain, since each block is 
embedded with the hash value of its previous block, which ensures immutability 
[64]. 

* Transparency and privacy protection: In the blockchain-empowered digital twin 
edge network, all the kinds of information recorded in the blockchain are trans- 
parent and openly accessible to all participants. Moreover, end users can change 
their identity (i.e. public key) after each transaction in the blockchain to protect 
their identity privacy. 

e Scalability and interoperability: Digital twins are digital replicas of physical en- 
tities, enabling close monitoring, real-time interactions, and reliable communica- 
tions between digital space and physical systems. They provide rich information 
to reflect the states of physical entities, to optimize the running of physical systems 
[65]. Therefore, the blockchain needs to provide scalability and interoperability 
for various digital twin services. The scalability of blockchain provides end users 
simultaneous access to the digital twin edge network. Meanwhile, the interoper- 
ability allows different digital replicas of physical entities to interact with each 
other seamlessly. 


The architecture of a blockchain-empowered digital twin edge network is shown 
in Fig. 5.1. In the physical plane, physical objects share information with each other. 
Various wireless/wired devices, such as sensors, radio frequency identification de- 
vices, actuators, controllers, and other tags can connect with IoT gateways, Wi-Fi 
access points, and base stations (BSs) supporting the communications between phys- 
ical objects. In the virtual plane, the digital twin edge servers provide the necessary 
computing capability to generate virtual twins of the physical objects, as well as 
model the channel conditions and data transmission among the physical objects. 
Moreover, a physical object realizes information transmission with a virtual twin 
through wireless communication technologies and shares the data in real time and 
accepts feedback from the virtual twin. In the blockchain plane, BSs are distributed 
in a specific area to work as verifiers. Specifically, if data or network resources are 
successfully shared between a requester and a provider in both the physical and vir- 
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tual planes, the requester should create a transaction record and send it to the nearest 
BS. The BSs collect and manage local transaction records. The transaction records 
are structured into blocks after the consensus process among the BSs is completed, 
and then stored permanently in each BS. The detailed processes are as follows. 
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Fig. 5.1 The architecture of a blockchain-empowered digital twin edge network 


* System initialization: For privacy protection, each device needs to register a le- 
gitimate identity in the initialization stage. In the blockchain-empowered digital 
twin edge network, an elliptic curve digital signature algorithm and asymmetric 
cryptography are used for initialization. A device can obtain a legitimate identity 
after its identity has been authenticated. The identity includes a public key, a 
private key, and the corresponding certificate. 

* Role selection for devices: Devices in both the physical and virtual planes choose 
their roles (i.e. data or network resource requester and provider) according to their 
current available resources. Devices with surplus resources can become providers 
to provide services for requesters. 

* Transactions: The existence of connections in the digital twin edge network poses 
new challenges for security and privacy. Therefore, communications in the physi- 
cal and virtual planes can be considered transactions, which are recorded into the 
blockchain to achieve security and preserve privacy. Additionally, the blockchain 
can store the synchronization data between the physical and virtual planes, so that 
the data are secure and private. In addition, the blockchain enables edge servers 
belonging to different stakeholders to cooperatively process computation tasks 
without a trusted third party. 

* Building blocks: BSs collect all the transaction records within a certain period and 
then encrypt and digitally sign them to guarantee their authenticity and accuracy. 
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The transaction records are structured into blocks, and each block contains a 
cryptographic hash of the prior block. To verify the correctness of a new block, 
the consensus algorithm is used. In the proof-based consensus process, one of the 
BSs is selected as the leader for creating the new block. Because of broadcasts, 
each BS can access the entire transaction record and has the opportunity to be the 
leader. 

* The consensus process: The leader broadcasts the created block to the other BSs 
for verification and audit. All the BSs audit the correctness of the created block 
and broadcast their audit results. The leader then analyses the audit results and, if 
necessary, sends the block back to the BSs for another audit. 
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Fig. 5.2 The time sequence of the transaction confirmation procedure 


5.2 Block Generation and Consensus for Digital Twin 


In the developed blockchain-empowered digital twin edge network, the distinct char- 
acteristics of the blockchain introduce unique challenges for data and edge resource 
sharing. Specifically, Fig. 5.2 illustrates the time sequence of T rounds of the trans- 
action confirmation procedure. In each round, the transaction confirmation time is 
divided into three time slots: a transaction collection slot, a block verification slot, 
and a resource sharing slot. For example, in the 7; round, this consists of the slot te 
for transaction collection, the slot tp for block verification, and the slot tg for resource 
sharing among the edge devices. The requester can obtain the required resource from 
the provider only at the end of round 7;’s transaction confirmation procedure, that is, 
when the block is appended to the blockchain. In practice, it could take a long time 
to successfully finish the transaction confirmation procedure, since both transaction 
collection and block verification are executed in a dynamic and stochastic wireless 
transmission environment. Transactions of edge devices in congested areas might 
not be successfully transmitted to the verifiers for verification in the transaction 
collection phase, which will lead to failure of the transaction confirmation proce- 
dure. On the other hand, the typical Nakamoto consensus protocol provides proof 
of work (PoW) [66], where the verifiers compete to solve a computationally difficult 
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cryptopuzzle. The fastest verifier that solves the cryptopuzzle will append its block 
to the blockchain. Nevertheless, the cryptopuzzle solving—based PoW consensus 
protocol consumes a large amount of computational and energy resources, which 
is not useful for resource-constrained edge devices, since they cannot undertake 
heavy computations. In addition, the audit and verification of the block among the 
verifiers can encounter impediments due to traffic congestion in the network. There- 
fore, the edge devices could suffer from long waiting times in resource sharing. A 
carefully designed transaction confirmation procedure is thus necessary for secure 
and privacy-protected resource sharing among edge devices while allowing for their 
resource sharing efficiency. 

To enable edge devices to obtain the required resources in time, we present a block 
generation and consensus process in the developed blockchain-empowered digital 
twin edge network. The process is based on a relay-assisted transaction relaying 
scheme that facilitates transaction collection in congested areas, and a lightweight 
block verification scheme based on delegated proof of stake (DPoS) that is utilized 
to reduce the resource consumption of the verifiers during block verification. 


e In the transaction collection phase, the local verifiers periodically collect trans- 
actions, verify the integrity and correctness of the transactions by validating their 
signature, and then process a number of validated transactions into a block. 

* In the block verification phase, the local verifiers that would like to add a block 
to the blockchain send consensus requests to a verification set, which consists of 
a set of preselected verifiers, and executes block verification and audit by using a 
proof-based consensus protocol. 


We develop two new schemes for transaction collection and block verification: 
(I) a relay-assisted transaction relaying scheme and (II) a DPoS-based lightweight 
block verification scheme. The work procedure for the developed schemes is shown 
in Fig. 5.3 and is illustrated in the subsequent discussion. 


5.2.1 Blockchain Model 


To enhance the security and reliability of digital twins from untrusted end users, 
the BSs act as blockchain nodes and maintain the running of the permissioned 
blockchain. The digital twins are stored in the blockchain and their data are updated 
as the states of the corresponding users change. The local models of the BS, also 
stored in the blockchain, can be verified by other BSs to ensure their quality. Thus, 
there are three types of records, namely, digital twin model records, digital twin data 
records, and training model records. 

The overall digital twin blockchain scheme is shown in Fig. 5.4. The BSs first 
train the local training models on their own data and then upload the trained models 
to the MBS. The trained models are also recorded as blockchain transactions and are 
broadcast to the other BSs for verification. The other BSs collect the transactions and 
pack them into blocks. The consensus process is executed to verify the transactions 
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in blocks. Our consensus process is executed based on the DPoS protocol, where the 
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Fig. 5.4 The blockchain scheme for federated learning 


stakes are the training coins. The initial training coins are allocated to BS 7 according 
to its data from digital twins, denoted as 


x Dpr; 
Si = — m Minis (5.1) 
2i Dk 


where $;,; is an initial value and K; is the number of digital twins associated with 
BS i. 

The coins of each BS are then adjusted according to their performance in the 
training process. If the trained model of a BS passes the verification of the other BSs 
and the MBS, the coins will be awarded to the BS. Otherwise, the BS will receive no 
pay for its training work. A number of BSs are elected as block producers by all the 
BSs. In the voting process, all the BSs vote for the candidate BSs by using their own 
training coins. The elected BSs take turns to pack the transactions in a time interval 
T into a block B and broadcast the block B to other producers for verification. 

In our proposed scheme, we leverage blockchain to verify the local models before 
embedding them into the global model. Due to high resource consumption required 
for block verification, the interval T should be set to multiple times the local training 
period; that is, the BSs execute multiple local training iterations before transmitting 
the local models to the MBS for global aggregation. 


5.2.2 Relay-Assisted Transaction Relaying Scheme 


As shown in Fig. 5.3, a requester that requires resources first sends a request to a 
nearby provider (Step 1). Then the provider generates a transaction (Step 2) and 
broadcasts its transaction for verification. A provider that would like to verify its 
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transaction in a congested area first sends a transaction relaying request. To facilitate 
peer discovery and reduce interference, the local verifier coordinates the transaction 
relaying link establishment; that is, the verifier selects a device near the provider as 
a relay device to assist in transaction relaying and reuses the channels of end users 
located far away in a different area (Step 3). We consider the time division multiple 
access technique in the transaction relaying transmission. However, the existence of 
a neighbour might not imply the stable establishment of the transaction relaying link, 
since the neighbour might not be willing to participate in transaction relaying due to 
the associated overhead, such as energy and bandwidth consumption. Thus, the local 
verifier needs to pay the relay devices a relay fee to motivate them to participate in 
transaction relaying. 


5.2.3 DPoS-Based Lightweight Block Verification Scheme 


DPoS has been demonstrated as a high-efficiency consensus protocol with moderate 
cost in which a part of the delegates (i.e. verifiers) are selected based on their stakes to 
perform the consensus process. Here, the stake is the accumulated time during which 
a delegate possesses its assets before using them to generate a new block. DPoS has 
been used in real scenarios, such as enterprise operation systems, BitShares, and 
the Internet of Vehicles. It is reasonable to consider that the DPoS can be utilized 
in the digital twin edge network and to develop a DPoS-based lightweight block 
verification scheme. Unlike computation-intensive PoW, the designed DPoS-based 
lightweight block verification scheme can leverage the stake of the verifiers as a 
mining resource to generate a block. The more stakes a verifier has, the higher its 
probability of finding a solution to generate a block. In addition, the verification and 
audit of the generated block are executed by only some of the preselected verifiers, 
thus keeping the computational complexity reasonably low. As shown in Fig. 5.3, 
the main steps in the DPoS consensus protocol in our lightweight block verification 
scheme involve verifying candidate generation, the verifier selection, the consensus 
process, and the transaction fee payment. 


5.2.3.1 Verifying Candidate Generation 


A verifier that wants to be a verifier first submits a deposit of stake to an account 
under public supervision. This deposit will be confiscated if the verifier behaves 
badly during a consensus process, for example, if it fails to generate a block in its 
turn or if it generates false block verification results. 
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5.2.3.2. Verifier Selection 


The blockchain users, that is, the end users possessing stakes, download the opinions 
of the candidates from the blockchain (Step 4) and vote for their preferred candidates 
according to some criteria, for example, voting for candidates that can generate 
and verify a block quickly (Step 5). A blockchain user can vote for more than one 
candidate and can also persuade others to vote for their favourite candidates. The top 
k candidates with the most votes are selected to form a verification set, where k is 
an odd integer, such as 21 in enterprise operation systems. The k verifiers all take 
turns acting as the block manager during k block verification subslots. 


5.2.3.3 Consensus Process 


In each block verification subslot, the block manager carries out block management 
in its own consensus process round (Step 6). Specifically, the block manager first 
broadcasts the unverified block to other verifiers for verification and audit. Then, 
each verifier locally verifies the signature of each transaction in the block and replies 
to other audit results with its signature. Following the reception of the audit results, 
each verifier compares its audit result with those of the other verifiers and sends 
a commit message to the block manager. Considering Byzantine fault tolerance 
consensus conditions, the block manager sends the current audited block to all the 
verifiers and the local verifiers for storage (Step 7), providing it receives a commit 
message from more than two-thirds of the verifiers. Finally, the provider provides 
the required resources to the requestor (Step 8). 


5.2.4 Conclusion 


We first presented the architecture of a blockchain-empowered digital twin edge 
network that consists of a virtual plane, a blockchain plane, and a physical plane. 
Then, we illustrated the processes of the developed architecture and showed the 
integration angles of the blockchain and the digital twin edge network. Further- 
more, we presented the block generation and consensus process in the developed 
blockchain-empowered digital twin edge network architecture. 
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Chapter 6 
Digital Twin for 6G Networks 


Abstract Digital twin is a technology that has the potential to help sixth-generation 
(6G) networks to realize digitization. In this chapter, we first introduce the com- 
bination of digital twin and 6G and then discuss two key use cases in terms of 
reconfigurable intelligent surfaces and digital twin and digital twins for stochastic 
offloading. 


6.1 Integration of Digital Twin and Sixth-Generation (6G) 
Networks 


To meet the ever-increasing demands of user traffic, fifth-generation (SG) networks 
integrate several novel network architectures, such as edge computing, software- 
defined networking, network function virtualization, and ultra-dense heterogeneous 
networks, to realize performance improvements for peak rates, transmission latency, 
network energy efficiency, and other indicators. However, the rapid proliferation 
and breakneck expansion of 5G wireless services also pose new challenges on 
transmission data rates, ubiquitous coverage, reliability, and network intelligence 
[67]. These challenges are spurring activities focused on defining the next-generation 
6G wireless networks. Compared with 5G, 6G networks are envisioned to achieve 
the superior performance in the following areas [68, 69]. 


* Peak data rate: The peak data rate is the highest data rate under ideal channel 
conditions where all available radio resources are completely assigned to a single 
mobile device. Driven by both user demand and technological advances such as 
terahertz communications, peak data rates are expected to reach up to 1 Tbps, 10 
times that of 5G. 

* Latency: Latency can be distinguished as the user plane and control plane latency. 
The minimum latency requirement for the user plane is 1-4 ms. This value is 
envisioned to be further reduced in 6G to 100 ws or even 10 us. The minimum 
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latency for the control plane should be 10 ms in 5G and is also expected to be 
remarkably improved in 6G. 

* Mobility: The highest mobility supported by 5G is 500 km/h. In 6G, the maximal 
speed of 1,000 km/h is targeted to meet the requirements of commercial airline 
systems. 

* Connection density: The minimum number of devices with a relaxed quality of 
service in 5G is 10°/km?. In 6G, the connection density is envisioned to be further 
improved by 10 times, to 10"/km?. 

* Energy efficiency: Energy efficiency is an important metric to enable cost-efficient 
wireless networks for green communications. In 6G, network energy efficiency 
is expected to increase 10 to 100 times compared to that in 5G. 

* Signal bandwidth: The requirement for bandwidth in 5G is at least 100 MHz, and 
6G will support up to 1 GHz for operations in higher frequency bands, and even 
higher in terahertz communications. 


Beyond imposing new performance metrics, emerging trends that include new 
services and the recent revolutions in artificial intelligence (AI), computing, and 
sensing will redefine 6G. Digital twin, as one of the emerging technologies for next- 
generation network digitalization, can pave the way for the creation of future digital 
6G by transforming and precisely mapping physical networks to digital networks with 
virtual twins. Digital twin will provide three main benefits for 6G. First, digital twin 
can provide a comprehensive and accurate network analysis for 6G with increasingly 
accurate and synchronous network updates. Second, digital twin can build a virtual 
twin layer between the physical entities and user applications. This can establish a 
bridge between the bottom network and the top application with better cross-layer 
interaction and timely user experience feedback. Third, digital twin-enabled 6G can 
utilize AI algorithms to adjust network schedules, such as task offloading, resource 
allocation, and network management. Thus, digital twin is an essential technique for 
6G in terms of supporting network automation and intelligence. 


6.2 Potential Use Cases 


Several works have explored utilizing digital twins to enhance the performance 
of next-generation communication networks. In [70], the authors proposed digital 
twin-enabled 6G to enable network scalability and reliability. The authors in [71] 
analysed the potential of digital twin for next-generation communication networks 
in terms of radio access, channel emulation, and network optimization. These works 
discussed how digital twin could be a powerful tool to fulfil the potential of 6G. 
Next, we present three detailed use cases of the combination of digital twin and 6G. 


e Reconfigurable intelligent surface (RIS) technology and digital twin: With the 
dense deployment of edge servers, there will be increasing data transmission re- 
quirements in the next-generation networks, which will aggravate network inter- 
ference and increase transmission delays. Current massive multiple input, multiple 
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output and millimetre wave technologies can increase wireless communication 
data rates, but these can incur high hardware costs and complicated signal pro- 
cessing issues. RIS is a new technology for 6G that can enhance spectral efficiency 
and suppress interference in wireless communications by adaptively configuring 
massive low-cost passive reflecting elements. However, to improve wireless trans- 
mission rates, RIS requires both the amplitude and phase of passive reflecting 
elements to be adjusted to facilitate an enhanced signal propagation environment. 
Since virtual twins can record the real-time states of physical objects, monitor the 
dynamic changes of wireless networks, and carry out optimization and prediction 
to improve the performance of the physical system, RIS can utilize digital twin to 
extract the key features of RIS components, such as the number of RIS elements, 
the phase and amplitude of the reflecting elements, and the mobile devices served 
by each RIS element. With the extracted information, digital twin can assist in 
RIS to adjust the wireless propagation environment to improve the signal-to-noise 
ratio and decrease the probability of outages. 

* Edge association and digital twin: The huge number of connected devices and 
the heterogeneous network structure of 6G pose great challenges for constructing 
digital twins in each network’s infrastructure. A possible solution for this issue is 
to select a subset of base stations as the digital twin servers to maintain the digital 
twins at reduced time cost and energy consumption, instead of maintaining digital 
twins at every base station (BS). To achieve this, the edge association problem must 
be addressed. The objective of edge association is to minimize the average system 
latency while providing delay-guaranteed service for each user. According to the 
running phases of digital twins, edge association consists of two subproblems: 
the digital twin placement problem and the digital twin migration problem. The 
digital twin placement problem involves how to choose the optimized subset of 
BSs as digital twin servers. The migration of digital twins problem involves how 
to allocate network resources to ensure relatively low transmission overhead and 
communication latency in the process of digital twin migration. 

* Cellular vehicle to everything (C-V2X) and digital twin: The rapid development of 
wireless communications and C-V2X has facilitated the wide use of smart vehicles 
and enriched many intelligent transportation system applications, such as smart 
navigation, road condition recognition, high-precision real-time mapping, forward 
collision warning, and driving assistance. However, due to the high mobility of 
vehicles, it is difficult to test C-V2X functionalities and performance for typical 
V2X use cases. Digital twin can provide a high-fidelity digital mirror of C-V2X 
systems throughout their entire life cycle [72]. By using digital twin mapping, the 
predicted state of automatic driving vehicles can be realized based on a virtual 
simulation test environment. Based on the prediction information of digital twin, 
driving behaviours and emergency events can be more actually determined and 
quickly perceived. 
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Fig. 6.1 Digital twin-empowered RIS framework 


6.3 Digital Twin for RIS 


To support emerging applications, 6G networks deploy computation/storage capabil- 
ities at BSs to avoid long transmission delays from mobile devices to cloud servers. 
However, while this shortens the distance and delay to access cloud server resources, 
it does not improve the wireless propagation environment. The recently proposed 
RIS technology can enhance spectral efficiency and suppress interference by adjust- 
ing both the amplitude and phase of passive reflecting elements. Digital twin can 
assist in RIS to intelligently adjust passive reflecting elements. 


6.3.1 System Model 


To clearly illustrate the combination of digital twin and RIS, we present a hierarchical 
digital twin-empowered RIS framework, as shown in Fig. 6.1. In this framework, 
edge resources can alleviate the heavy computational pressure of mobile devices, 
and edge servers can reduce task processing latency due to their proximity to mobile 
devices. RIS can enhance the quality of wireless communication links in the process 
of task offloading by intelligently altering the radio propagation environment. 

The proposed framework consists of two layers: an RIS-aided communication 
layer and a digital twin-empowered virtual layer. In the RIS-aided communication 
layer, RIS elements are distributively installed on the surface of building facades, to 
improve propagation conditions and increase the quality of wireless communications. 
The digital twin-empowered virtual layer is constructed by diverse distributed edge 
servers. With edge resources and AI algorithms, virtual twins can construct a real- 
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time mirror of the physical network to enable intelligent policy design, quality 
of service requirements, resource management, and network topology monitoring. 
This is a general framework that can improve the communication and computational 
performance in many scenarios, including cellular, vehicular, and unmanned aerial 
vehicle networks. 


6.3.2 Computation Offloading in Digital Twin-Aided RIS 


To elaborate on how digital twin assists in RIS coefficient adjustment, in this section, 
we present a case study that focuses on RIS-aided offloading. We consider a network 
of digital twin-aided RIS offloading that consists of a physical network entities 
layer and a digital twin-empowered virtual layer. The physical network entities layer 
contains three types of physical entities: base stations, RISs, and mobile devices. 
Since digital twin mirrors a physical entity, the digital twin—-empowered virtual layer 
also contains three types of virtual models. The first type of virtual model involves 
the BSs. We consider that each physical BS has multiple antennas and an edge server 
for providing edge computing via wireless communications. The virtual model of 
a BS with edge intelligence and can thus predict current available communication, 
computing, and caching resources and monitor current wireless links to construct the 
current network topology. The second type of virtual model involves RIS, including 
the number of RIS elements and the phase and amplitude of reflecting elements. The 
key function of this virtual model is to adjust the RIS coefficients. The third type 
of virtual model involves mobile devices. This type of virtual model mainly records 
the size of the collected data, the current locations of the mobile devices, and the 
latency or computational resource requirements of on-device applications. 

Task offloading aims to offload the computation-intensive tasks of mobile devices 
to nearby distributed BSs for processing. The virtual model of each mobile device 
records the computation-intensive task as (dx, cx), where d; is the data size of task 
k and cx is the required computation resource for the computing unit bit. The virtual 
model needs to determine what part of the task should be processed locally and 
how much should be offloaded to the edge server to process. We define this as the 
offloading ratio (i.e. xz). RIS offloading utilizes RIS to assist in task offloading for a 
higher wireless communication rate. Different from traditional wireless transmission 
links, which only include direct device-BS links, the wireless transmission link 
in RIS offloading includes both of device—BS links and reflected device-RIS-BS 
links. For the device-BS link, the virtual model of the BS records its channel 
vector, that is, hl. The reflected device-RIS-BS link contains three components: 
the device-RIS link, the RIS reflection with phase shifts, and the RIS-BS link. The 
virtual RIS model records the channel vectors of the device-RIS link and RIS- 
BS link as hj, and h” , respectively. The RIS reflection coefficients are denoted as 
© = diag(B,e/", B5eJ*., ..., Bye/"), where Bn and 0, are the amplitude and phase 
shift of the nth RIS element, respectively. The effective channel gain can be expressed 
as 
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gr = h? +h” Gh. (6.1) 


Based on channel gain, the maximum achievable wireless transmission data rate 
can be obtained by 
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where B is the system’s bandwidth. The virtual RIS model should properly adjust 
the reflection coefficients to improve the wireless communication rate. 

The task execution latency is determined by the local computation and task of- 
floading. The latency of local computation is mainly related to the computational 
capability of each mobile device (i.e. i: The latency of task offloading involves 
the task transmission time and edge computation time. Since the two parts are ex- 
ecuted in parallel, the total task execution latency is equal to the maximal value of 
the two processes. To minimize the total task execution latency, the RIS configura- 
tion offloading ratio and computation resource must be jointly optimized. The RIS 
offloading problem can be formulated as 


; l1-xk Xk Xk 
min, 2. pasti 7 , dkCk s + dk Riles: 
s.t. » fí«F,0«f!«F,kteX, (6.32) 
keK 
xi Bn € [0,1], k € ,n € N, (6.3b) 
0x0,x2mncN, (6.3c) 


where f? and F* are the computation resource that the BS allocates to task k 
and the total computation resource of the BS. Constraint (6.3a) is the computation 
resource allocation constraint. Constraints (6.3b) and (6.3c) are the value ranges 
of the offloading ratio, amplitude and phase shift variables, respectively. Since the 
digital twin-empowered virtual layer has AI ability, we can use AI, such as deep 
reinforcement learning (DRL), to solve the complex optimization problem. We first 
reformulate the above optimization problem as DRL with a system state, action, and 
reward. The state has five components: 


s(t) = (dit), ex (t), fL (t). Fs, OD}. (6.4) 


In the environment, the BS assembles the information as a state and sends it to the 
DRL agent. The action has four parts, which are the variables of the optimization 
problem: 


a(t) = {xx(t), fg (£), Bn(t), On (0)]. (6.5) 


Based on the state and action, the agent can produce a reward R!”""(s(t), a(t)) 
from the environment, where the reward is related to the objective function. In this 
scenario, the total task execution latency can be regarded as the reward function. 
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Fig. 6.2 Cumulative task execution latency under different schemes 


Based on the state, action, and reward, we exploit asynchronous actor-critic DRL 
to solve the formulated problem [73]. Asynchronous actor-critic DRL consists of a 
global agent and several local agents. The global agent accumulates all the parameters 
of the neural networks from the local agents. Each local agent has an actor neural 
network and a critic neural network. The actor neural network is for generating 
actions and the critic neural network is for evaluating the performance of the action 
generated by the actor neural network. At each training step, the parameter of the 
actor neural network is updated based on 


On On + Oa 2. Ve, log n(s(1)05) (R (s(t), a(t)) 
T (6.6) 
+ dvg,(s(t + 1)) — vo, (s(1))). 


where a, is the learning rate of the actor network and (s(t)|0,,) is the output of the 
actor neural network. The parameter of the critic neural network is updated based on 


0, — Oy + ay b» Vo, R'"" (s(r), a(r)) + óva, (s(t + 1)) — va, G(O)). (6.7) 


where a, is the learning rate of the critic network. 

Figure 6.2 shows the total task execution latency of computation offloading under 
different RIS configuration schemes. First, we can see that the proposed DRL- 
based computation offloading algorithm converges in all cases and the cumulative 
task execution latency reduces with the number of episodes. Further, the offloading 
latency with RIS aid is lower than the latency without RIS aid. The reason is 


78 6 Digital Twin for 6G Networks 


Physical network Digital Twin 


Fig. 6.3 Illustration of a digital twin network 


that RIS offloading can achieve a higher transmission data rate, thus resulting in a 
lower transmission latency. In addition, the offloading latency with optimized RIS 
configuration is the lowest due the optimal adjustments of the RIS amplitude and 
phase shift. 


6.4 Stochastic Computation Offloading 


To improve task processing efficiency and prolong the battery lifetime of mobile 
devices, computation offloading is a promising approach that can offload the collected 
data and computation tasks to distributed BSs for processing. However, current 
research focusing on computation offloading assumes that each device executes a 
single computation task, without considering the randomness of task arrivals. Such 
an assumption in the designed policy cannot be applied to a network with a stochastic 
task arrival model. Since digital twin is a powerful technology that can monitor and 
analyse the dynamic changes of physical objects, in this section, we utilize digital 
twin to construct virtual models of the physical objects and solve the stochastic 
computation offloading problem considering dynamic changes of the task queue. 


6.4.1 System Model 


We consider a digital twin network consisting of a physical network and its digital 
twin. As shown in Fig. 6.3, the physical network has three major components: 
distributed mobile devices, small base stations (SBSs), and a macro base stations 
(MBS). Each device collects data from sensors and on-device applications, and the 
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collected data must be analysed in real time. Since data analysis is computation 
intensive, devices with limited computation capability and battery power might 
not be able to conduct the data analysis in a timely manner. So, the devices must 
offload these tasks to edge servers for a high quality of computational experience. 
Digital twins contain the virtual models of the physical elements. Virtual models 
not only mirror the characteristics of the physical elements/system, but also make 
predictions, simulate the system, and can play a crucial role in policy design and 
resource allocation. In the network, digital twin can be utilized [74] to 1) construct 
the network topology of the physical network; 2) monitor network parameters and 
models, that is, dynamic changes of resources and stochastic task arrival processes, 
and 3) optimize offloading and resource allocation policy. 


6.4.2 Stochastic Computation Offloading: Definition and Problem 
Formulation 


Based on digital twin, the digital representation (i.e. virtual models) of the physical 
network (i.e. virtual world) is created. The virtual models here comprise the wireless 
network topology, the communication model between the devices and BSs, and the 
stochastic task queueing model. 

(1) Network topology in the digital twin network 

Digital twin first models the physical network as a graph G = (U, 8, €), where 
U = (uj, .., uy } and B = (bo, bi, ..., bm } are, respectively, the sets of devices and 
BSs (where bo is the index for the MBS, and the other values are the indexes for the 
SBSs). The term e is the edge information, that is, for the connection between the 
devices and BSs. 

Then, the digital twin uses a 3-tuple DT;(t) to characterize devices, that is, 
DR) = (Pi,max(t). li (t). f. where Pi,max(t) denotes the maximal transmission 
power in time slot f, /;(t) denotes the current location of u;, and Ji denotes the 
computation resources of the local server. Similarly, the digital twin uses a 3-tuple 
DT; (t) to characterize the BSs, that is, DT; (t) = {1; (t), wj, fih where /; (t) denotes 
the current location of b;, w; denotes the bandwidth of 5;, and fi denotes the 
computation resource. 

The task offloading between the devices and BSs is facilitated through wireless 
communication. Here, we consider that devices communicate with the nearest BS 
for offloading. The wireless communication data rate between device u; and SBS b; 
can be expressed as 


J^ 


phi r$? 


Ri; (t) = wij(t)log(1- ZEI 


), (6.8) 
where w;;(t) (wi;(t) € wj) is the bandwidth that SBS b; allocates to device u; in 
time slot f, hi) is the current channel gain, a is the path loss exponent, c? is the 
noise power, "i (t) is calculated based on the locations of /; (t) and /; (1), and Z is the 
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interference from other SBSs. With the adoption of orthogonal frequency division 
multiple access, the interference of different devices in the coverage of the MBS is 
ignored. The wireless communication data rate between device u; and the MBS is 


TOYOTA 
REW = wio() log(1 + ^ (Rio io) 
[02 


), (6.9) 


where wjo(t) (wjo(t) € wo) is the channel bandwidth between device u; and the 
MBS in time slot f, (f) is the channel gain between device u; and the MBS, and 
rig (t) is the distance between device u; and the MBS. 

(2) Stochastic task queueing 

At the beginning of time slot t, device u; inputs the size of the computation task 
of 4; (t) (bits/slot) into the local dataset. We assume the 4; (ft) values in different time 
slots are independent, and E[A;(f)] = A. Since device u; has computation resources, 
it can execute part of the computation task locally. We consider the size of the 
computation task that is executed locally as D! (t). The size of the computation task 
offloaded to BS b;(j € 8) is Dé At). The rest is stored in a local task buffer, as 


shown in Fig. 6.4(a). Assume the queue length of the local task buffer is Q!(r) and 
the queue length is dynamically updated with the following equation: 


Q'(t 1) = max(Ql!(r) — ¥;(1),0} + A(t), (6.10) 


where V;(t) = Di(t) + Dirt) is the size of the computation task that leaves the task 
buffer of device u; during time slot t. 

Each edge server also has a task buffer to store the offloaded but not yet executed 
task. As shown in Fig. 6.4(b), the queue length is dynamically updated by 


Q*(t * 1) = max(Q*(r) - Y; (2), 0} + 3 D0. (6.11) 


ied 
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where ?/;c4; Dit) is the amount of tasks offloaded from all the devices connected 
to BS j, and V ;(1) is the size of the computation tasks leaving the edge task buffer. 
According to the definition of stability in [75], the task queue is stable if all the 
computation tasks satisfy the following constraints: 


T-1 

lim =. 22: 2: E{Q!(t)} < oo, (6.122) 
t-0 ied 
T-1 

Jim =. 25 > E(Q*(t)) < oo. (6.12b) 
t=0 JEB 


(3) Task offloading in the digital twin network 

Let 1 (t) be the computation resource of device u; during time slot t and let c 
denote the required computation resource for executing one bit of a computation 
task. Thus, the size of computation tasks executed locally will be 


f 
D! (f) = E (6.13) 


where r is the duration of the time slot. The energy consumption of a unit of 
computation resource is ¢( PY- where ç is the effective switched capacitance, 
depending on the chip architecture. The local energy consumption for computing 
task Di(t) can be defined as 


EIU) s sT fI’. (6.14) 


Devices offload their tasks to BSs via wireless communication. Since the devices 
are associated with different BSs, the offloaded tasks of device u; during time slot t 
can be expressed as 


"TE (uu j € B/{bo}. Dus 


Ri OT j — bo. 


The energy consumption in this case has three parts: the energy consumption for 
uplink offloading, the energy consumption for computation, and the energy con- 
sumption for downlink feedback. The third quantity is generally ignored due to its 
small data size. Thus, the energy consumption for executing task Dé (t) on BS bj 


can be expressed as 
Di; (t) * c 


FO 


where "m (t) is the computation resource that b; allocates to device u; in time slot f, 
and e is the energy consumption for unit computation on edge servers. 

The total energy consumption is the combination of local energy consumption, 
edge server energy consumption, and the transmission energy consumption for com- 
putation offloading. Therefore, the total energy consumption can be expressed as 


E$, (t) = pi(t)t + ———— * E, (6.16) 
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(4) Stochastic offloading problem 
Based on the total energy consumption, we can define network efficiency as 


NN 
imr... = so EEA} 
NEE = , (6.18) 
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This is the ratio of long-term total energy consumption to the corresponding long- 
term aggregate of accomplished computation tasks. 

We define a(t) = [w(t), p(t), Y(t), f! (7), f*(1)] as the system action in time slot f, 
where w(t) is the bandwidth allocation vector, p(t) is the transmission power vector, 
Y(t) is the vector associated with the computation task leaving the edge servers, and 
f!(r) and f¢(t) are the vectors of computation resources that edge servers allocate 
to the devices. Taking the network stability constraint into account, the stochastic 
offloading problem for minimizing ņgg can be formulated as 


Pl: min ggg 
a(t) 


Wij(t) 

S.t. ——— <l, wij(t) > 0, (6.19a) 
ied my 
0 < pi(t) € Pi,max(t), (6.19b) 
0< qo ef. (6.19c) 
2 40 < ff, fE) 20, (6.19d) 
ied 
V;(n*cs frr, Y(t) z 0, (6.19e) 


(6.12) — (6.125). 


Constraint (6.192) is the bandwidth allocation constraint. Constraints (6.19b) and 
(6.19c) denote the transmission power and computation resource constraints, respec- 
tively. Constraint (6.19d) is the computation resource allocation constraint. Con- 
straint (6.19e) implies that the amount of computation resource for processing task 
Y; cannot exceed the available computation resources. 

Problem P1 is a stochastic optimization problem. The complex coupling among 
optimization variables and mixed combinatorials make P1 difficult to solve. Further, 
the stochastic task arrival, dynamic channel state information, and dynamic task 
buffer make it challenging to design an efficient resource management policy for the 
devices and edge servers. We therefore exploit Lyapunov optimization to transform 
the original stochastic optimization problem into a deterministic per-time block 
problem and propose a stochastic computation offloading algorithm to solve P1. 
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6.4.3 Lyapunov Optimization for Stochastic Computation Offloading 


We define the quadratic Lyapunov function as the sum of the squared queue backlogs, 


LOW) = 509 t0] - AT + Y^ 0509. (6.20) 
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where O(t) = [Q'(r), Q* (1)] represents the current task queue lengths of the devices 
and edge servers, and £ is a perturbation vector. Further, we define the Lyapunov 
drift-plus-penalty function as 


AyL(O(t)) = AL(8(1)) + VE[nzz(1)]8(1)]. (6.21) 


where AL(O(t)) = B[L(O(t + 1)) - L(0(1))80(7)] is the conditional drift, and 
V is a non-negative weight parameter. By minimizing AyL(@(t)), we can ensure 
network stability and simultaneously minimize network efficiency. The upper bound 
of Ay L(O(t)) can be derived as 


^v L(8()) < C - DO!) - B]B[S(O) - AC)]8()] 
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where C = Beul? imas t Å; masl + Xjes[V? pss (Xiequ Df, - ]}, and 
Pioa PES YF; max, and Di, — ate the upper bounds of Y; (t), 4; (t), ; (1), and 
Df), respectively. Based on Lyapunov optimization theory, we can minimize the 
right side of the inequality in (6.22) to obtain the optimal solution of P1. Specifically, 
instead of solving P1, we can observe O(f) and 4;(t) to determine a(t) by solving 


the following problem in each time slot: 


P2:min V[E'"'() -neel X, (Di + Df] 37 
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s.t. (6.12a) — (6.12b), (6.19a) — (6.19e). 


Problem P2 needs to minimize the system cost per time slot. Here, we use DRL to 
solve P2, because it is efficient for finding a near-optimal solution in real time. 

To solve P2, the system first constructs a Markov decision process, that is, M = 
(S, A,P, R), and then uses a DRL algorithm to explore the actions. From Fig. 6.5, 
the network state s(t) is constructed by digital twin and output to the DRL agent. 
To gather network information, digital twin needs to predict the locations, energy, 
and the generated task flow of the devices and BSs. The locations can be predicted 
by the K-nearest neighbours classification method in [76]. To prolong the battery 
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Fig. 6.5 Digital twin-enabled DRL 


life of the devices, some of them are equipped with energy-harvesting chips, such 
as solar panels. Digital twin thus needs to support solar energy prediction here. The 
generated task flow is based on the application running on each device. Digital twins 
are used to first predict and gather the information on location, energy, and task flow. 
Then, based on the gathered information, digital twin updates the network topology, 
channel condition, and task queueing models. Finally, digital twin generates the 
current state and transmits it to the DRL agent. 

The DRL agent constructs the system state as s(t) = {R(t), F, Pmax (t), w, O(1)) 
with wireless data rate, computation resource, transmission power, and task queueing 
information. Action a(t) = [w(t), p(t), V(r), f!(r), f*(t)| is constructed with the 
bandwidth allocation, the transmission power, the executed computation task, and 
the computation resource allocation. It is worth noting that all the variables in action 
a(t) are continuous. Thus, we will utilize a policy gradient-based DRL algorithm 
to explore policy. After executing action a(t), digital twin updates the system state 
and estimates the immediate reward R””" (s(t), a(t)). Because the distribution of 
transition probabilities is often unknown in DRL, the DRL agent utilizes a deep neural 
network to approximate it. We define the immediate reward function R®™™ (s(t), a(t)) 
as the objective of P2 problem. After computing the immediate reward, the system 
updates its state from s(t) to s(t + 1) based on action a(t). 

We use an online and asynchronous DRL algorithm to explore policy. The online 
DRL consists of a global agent and multiple learning agents. The detailed policy 
is explored by the learning agent in each SBS. The policy learned by the learning 
agent is a(t) = z(s(1)|05), where z(s(t)|0;.) is the explored offloading and resource 
allocation policy produced by a deep neural network. According to s(t) and a(t), the 
DRL agent can produce the reward and the next state. To estimate the performance 
of the proposed DRL algorithm, we consider a network topology with one MBS, 
M = 3 SBSs, and N = 20 devices. Each learning agent has an actor network and a 
critic network. The actor network has three fully connected hidden layers, each with 
128 neurons, and an output layer with eight neurons using the softmax function as 
the activation function. The critic network has three fully connected hidden layers, 
each with 128 neurons, and one linear neuron output layer. 

Figure 6.6 depicts the system costs with respect to training episodes under different 
schemes. The green curve is the benchmark of the joint optimization of computation 
offloading, the bandwidth, and the transmission power, but without computation 
resource allocation. The orange curve is the benchmark of the joint optimization of 
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Fig. 6.6 System costs under different schemes 


the computation offloading and computation resource allocation. Figure 6.6 shows 
that the performance of the proposed scheme outperforms the two benchmarks, since 
itcan concurrently optimize computation offloading, the bandwidth, the transmission 
power, and the computation resource allocation. In addition, the system cost of the 
orange curve is lower than that of the green curve. This means that, compared with 
the optimization of computation resources, the joint optimization of the bandwidth 
and transmission power has a greater influence on performance. 
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Fig. 6.7 System costs with respect to the number of devices under different schemes 


Figure 6.7 compares the system costs with respect to the number of devices under 
different schemes. The number of devices ranges from 10 to 40. From Fig. 6.7, we 
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can make two observations. First, for each of the three schemes, the system cost 
increases with the number of devices. The reason is that the increase of devices leads 
to more offloading requests, resulting in the consumption of more communication 
and computation resources. Second, the performance of the proposed algorithm out- 
performs two benchmarks by jointly optimizing computation offloading, bandwidth, 
transmission power, and computation resource allocation. 
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Chapter 7 
Digital Twin for Aerial-Ground Networks 


Abstract With the widespread deployment of unmanned aerial vehicles (UAVs) in 
civil and military fields, researchers have turned their attention towards the emerging 
area of aerial-ground networks for computing-intensive applications, data-intensive 
applications, and network-intensive applications. However, the application of aerial- 
ground networks relies on dynamic perceptions and intelligent decision making, 
which are difficult to conceive of due to the heterogeneity of ground devices and the 
complexity of the aerial-ground environment. The convergence of digital twin (DT) 
and UAVs has great potential to tackle the challenge and improve the service quality 
and stability in applications such as rescue and search and communication relaying. 
This chapter first investigates the advantages, challenges, and key techniques of DTs 
for aerial-ground networks. In addition, we highlight the main issues of DT for 
UAV-assisted aerial-ground networks with two case studies, including cross-domain 
resource management and intelligent cooperation among devices. 


7.1 Introduction 


Recently, aerial-ground networks based on unmanned aerial vehicles (UAVs) have 
made great success in various applications, such as disaster relief, service congestion, 
and damage assessment. Thanks to their inherent advantages, such as wide coverage, 
high flexibility, and strong resilience, UAVs can act as aerial mobile base stations 
to provide seamless and intelligent services for ground devices. However, due to the 
heterogeneity and mobility of ground devices and the dynamic network topology, 
the advantages of aerial-ground networks cannot be fully exploited. As an emerging 
digital mapping technology, digital twin (DT) has great potential to tackle the network 
dynamics and complexity of aerial-ground networks. By mapping the channel state 
and computing state, DT established on UAV can reflect the state of ground devices 
or network topology in a timely manner and accurately capture their state changes. 
After learning from these complex statuses, DT established on UAVs can support 
diversified applications, such as trajectory planning, large-scale mapping, urban 
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modelling, road patrol, and anti-piracy. We detail the main application scenarios of 
DT deployed on UAVs in different fields as follows. 


* Smart city: In smart cities, with the help of DT deployed on UAVs, we can build 
a large-scale virtual city, dynamically monitor urban facilities, allocate urban 
public resources, and further realize intelligent collaborative decision making in 
urban management. 

* Disaster rescue: In the disaster rescue field, UAVs with DT can analyse the con- 
nection performance of ground rescue devices and then make proactive commu- 
nication resource allocations for high-priority devices to maximize the long-term 
quality of service (QoS). 

* Telemedicine: In the field of telemedicine, through smart wearable devices, pa- 
tients’ health information can be sent back to DTs deployed on UAVs. UAVs 
with DT can track and monitor a patient’s health status remotely and in a timely 
manner. When the DT measures any abnormal information, the rescue agency 
can immediately provide first aid services. 

* Internet of vehicles: In the Internet of Vehicles (IoV), UAVs with DT can com- 
petently implement the real-time planning of vehicle trajectories in a specific 
area. At the same time, services such as status awareness and mobility predic- 
tion provided by DT can effectively avoid traffic congestion and reduce traffic 
accidents. 


DT is able to assist in the optimal allocation and intelligent dispatching of valuable 
aerial resources. We further summarize the advantages of DT and UAV fusion as 
follows. 


e Hyperconnectivity: Due to the wide coverage of UAVs over ground devices, DT 
deployed on UAVs can achieve interoperability and hyperconnectivity with physi- 
cal counterpart devices. DT deployed on a UAV can connect all the ground devices 
in the aerial-ground network. We can fully utilize the advantages of DT from the 
multidimensional integration of information to sense how different devices work 
together, thus building an aerial-ground network with hyperconnectivity. In an 
aerial-ground network with hyperconnectivity, DT has the interaction details and 
status information of all the devices, which can then dynamically provide optimal 
decisions for different problems. 

* Low latency: Thanks to the mobility of UAVs, DT deployed on a UAV can maintain 
a specified synchronization frequency with the ground device, which enhances 
the fidelity of the signal and brings more reliable DT services to the device. 
DT is sensitive to synchronization frequency, and untimely state synchronization 
or instruction updates can cause DT to make incorrect decisions. UAVs have 
the ability to move with mobile devices such as vehicles, significantly reducing 
DT status update delays due to communication distance. DT deployed on UAVs 
can better meet the requirements of devices for low network latency in different 
application scenarios, such as real-time trajectory planning in IoV, and provide 
services with higher performance and reliability. 

* Strong stability: DT deployed on UAVs can monitor the status of each aerial- 
ground network in real time, which ensures the coverage and stability of the 
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aerial-ground network and makes the DT service more stable. Deploying DT 
on the ground makes it difficult to perform timely maintenance in the event 
of an attack or communication failure, which will lead to the interruption of 
DT services. DT deployed on a UAV can closely monitor the state changes 
in different aerial-ground networks, after detecting emergency situations such as 
UAV damage and network failure. DT can then immediately replenish and replace 
UAVs, continuously providing high-stability and high-performance services for 
devices. 


The complementary advantages of DT and UAV play an important role in di- 


verse applications that require stable network connections. However, there are still 
challenges in how to customize DT on UAVs for smart services in aerial-ground 
networks. We summarize the challenge in two cases. 


Cross-domain resource allocation: An aerial-ground network involves two dif- 
ferent resource domains: the aerial domain and the terrestrial domain. The main 
challenge that DT faces on UAVs is the effective allocation of limited resources 
across domains under resource and distance constraints. D'T-enabled intelligent 
services are often supported by a large amount of data distributed over various 
terminal devices. In a large-scale aerial-ground network, there are limitations 
of physical distance, communication resources, and computing resources; there- 
fore, how DT deployed on UAVs effectively allocates resources across domains 
deserves in-depth study. In addition, the limited energy capacity of UAVs can- 
not support DT modelling, and DT relies on abundant computing resources and 
sufficient energy supply, which further limits the endurance of UAVs. 
Cross-device intelligent collaboration: The intelligent collaboration of different 
devices in an aerial-ground network is an important link to keep the network 
running efficiently. One of the important features of aerial-ground networks is 
a highly dynamic network environment. Diverse devices are constantly joining 
and withdrawing from the network, and mobile devices such as vehicles, UAVs, 
and mobile phones have low latency tolerance. For DTs deployed on UAVs, en- 
abling different devices to achieve dynamic joint decision making and intelligent 
collaboration in tasks such as autonomous driving and trajectory planning while 
reducing network latency is challenging. 


7.2 Key Techniques 


7.2.1 Cross-Domain Resource Management 


Aerial-ground networks can enhance the environmental perception and decision 
making capabilities of the network by leveraging multidimensional resources to 
achieve resource management. However, the resources in different domains (such 
as air and ground) are complicatedly coupled, and the orchestration of these cross- 
domain resources is confronted with a huge state-action space, which makes it 
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difficult to allocate resources optimally in real time [78, 79]. To effectively manage 
the multidimensional resources (for communication, computing, and caching) of 
aerial-ground networks, the state change and QoS of the network are the key factors 
to consider. 

Ensuring the flexibility and efficiency of resource management: 

Aerial-ground networks are extremely dynamic and complex because of the high 
mobility of heterogeneous devices and the large scale of the networks. It is difficult to 
achieve flexible and efficient network resource management. As an emerging digital 
mapping technology, DT provides an approach for realizing effective and reliable 
network orchestration by mapping and predicting the dynamics of networks. Deng et 
al. in [80] proposed a combined approach of expert knowledge, reinforcement learn- 
ing, and DT to cope with the dynamic changes of high-dimensional network states. 
Dai et al. in [74] proposed a new paradigm DT network for the Industrial Internet 
of Things (HoT) and formulated random computing shunting and resource alloca- 
tion problems, using Lyapunov optimization technology to transform the original 
problem into a deterministic per-slot problem. Lu et al. in [37] proposed a DT edge 
network to fill the gap between the physical edge network and the digital system. The 
integration of DT technology into aerial-ground networks can yield considerable im- 
provement in both the latency performance and computing efficiency of applications 
running on ground devices and aerial devices. 

Software-defined networking (SDN) can be utilized to construct and manage vir- 
tual networks to support specific network services for flexible network management 
[81]. Based on SDN architecture, Li et al. in [82] modelled multidimensional re- 
source scheduling as a partially observable Markov decision process and used value 
iteration to jointly optimize networking, caching, and computing. Due to the compli- 
cated coupling of multidimensional resources, the central controller can hardly know 
a priori the effects of its actions on system performance. To this end, He et al. in [83] 
proposed a resource orchestration method based on deep reinforcement learning, 
with which the central controller learns an effective policy via trial-and-error search. 

Ensuring QoS performance: The effective management of the multidimensional 
resources (for communication, computing, and caching) of aerial-ground networks to 
guarantee the required QoS performance of ground devices is also an important chal- 
lenge. High computational complexity, the large cost of equipment deploymemt, and 
limited resources are the factors that hinder the improvement of QoS performance. 


* Reducing computational complexity: Due to the limited computing and commu- 
nication capabilities of ground devices, task offloading, as a key technology, can 
effectively improve service execution efficiency and realize the fast and efficient 
response of ground devices. Task offloading means that resource-constrained mo- 
bile terminal devices can offload overloaded computing tasks to edge nodes with 
stronger computing or communication capabilities, to improve computing speed 
and save energy. For example, road side units (RSUs) can undertake computation- 
intensive tasks (e.g. semantic image segmentation, motion planning, and route 
planning) for vehicles. Xu et al. in [30] proposed a service offloading method 
with deep reinforcement learning in DT-empowered IoV to provide vehicular 
services with a high QoS level. To reduce processing delays, Do-Duy et al. in [84] 
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proposed a novel DT framework assisting in the task offloading of IoT devices 
for HoT networks with mobile edge computing. Qu et al. in [85] proposed a deep 
meta-reinforcement learning offloading algorithm that combines multiple parallel 
deep neural networks with Q-learning, quickly and flexibly obtaining the optimal 
offloading strategy from a dynamic environment. 

* Reducing equipment deployment costs: To achieve effective resource management 
and satisfy the diverse QoS requirements, the deployment cost of edge nodes 
cannot be ignored in aerial-ground networks. Using a large number of edge 
nodes to completely cover an area means a large deployment cost. When ground 
devices offload computing tasks to nearby edge nodes through the assistance of 
UAVs, the appropriate incentive is required for edge nodes to contribute their 
services. Edge nodes can be unwilling to contribute their services if the rewards 
cannot compensate for their service costs. Sun et al. in [86] designed an incentive 
mechanism to motivate RSUSs to provide computing resources for ground vehicles. 
It was able to effectively complete vehicle task offloading schemes with the 
assistance of UAVs in an aerial-ground network. Zhou ef al. in [87] proposed 
a novel incentive-driven and deep Q-network-based method and combined a 
content caching strategy and incentive mechanism to improve the performance of 
device-to-device offloading. To realize the long-term stability of DT services, Lin 
et al. in [88] designed an incentive-based congestion control scheme to offload 
real-time mobile data captured by DT to mobile edge computing servers. 

* Reducing the burden of aerial devices with limited resources: Most works ignore 
the fact that centralized resource allocation schemes introduce a great burden 
to aerial devices, especially to UAVs in aerial-ground networks. Moreover, the 
incentive mechanism can be computation intensive, which results in service- 
unrelated energy consumption and further deteriorates service endurance. Thus, 
the resource allocation scheme should be carried out in a distributed manner. 
Through cooperative networks [89], SDN controllers can be decomposed into 
multiple simpler controllers to reduce the complexity of a large action space. 
Nasir et al. in [90] thus leveraged multi-agent deep Q-learning to distributedly 
schedule power allocation in wireless networks. The alternating direction method 
of multipliers (ADMM ) is a distributed parallel optimization algorithm, and 
resource allocation problems based on ADMM have attracted much attention. 
Wang et al. in [91] considered computational offloading, resource allocation, and 
content caching strategies as optimization problems. An algorithm for solving 
optimization problems based on the ADMM algorithm was designed. Liang 
et al. in [92] proposed an efficient ADMM-based distributed virtual resource 
allocation algorithm in virtualized wireless networks. In addition, Zheng et al. 
in [93] designed a converged and scalable Stackelberg game-based ADMM for 
edge caching to solve storage allocation games and user allocation games in a 
distributed manner. 
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7.2.2. Cross-Device Intelligent Cooperation 


In aerial-ground networks, heterogeneous ground devices can collaborate with aerial 
devices to accomplish intelligent network orchestration based on federated learning. 
Cross-device intelligent cooperation plays an important role in the efficient operation 
of networks and stable network environments. However, due to the heterogeneity, 
mobility, and selfishness of devices, across-device intelligent cooperation based on 
federated learning still faces many challenges. For example, further optimization is 
needed in terms of communication efficiency, training efficiency, and training costs. 
DT has the powerful ability to capture the state of heterogeneous devices in real 
time, which can effectively promote cross-device intelligent cooperation. 

Improving communication efficiency: The heterogeneity and high mobility of 
devices complicate network management. The real-time changes of device states 
can lead to inaccurate channel estimation and affect the communication efficiency 
of federated learning. DT can analyse the connection performance of devices and 
make proactive communication resource allocations for improving communication 
efficiency. Lu et al. in [9] proposed a blockchain-based DT-enabled federated learn- 
ing scheme to improve communication efficiency. Tran et al. in [94] studied the 
collaborative optimization problem when devices participate in federated learning 
in wireless networks. By adjusting a device's resource allocation strategy and the 
local training update frequency between two global aggregations, the best trade-off 
between communication time and computing performance can be achieved. Sun et 
al. in [33] used deep reinforcement learning to adaptively adjust the cooperative 
aggregation strategy of federated learning to achieve the balanced optimization of 
communication and computing. Krouka et al. in [95] proposed a novel distributed re- 
inforcement learning algorithm to solve the random interference and communication 
interference of wireless channels and optimize communication efficiency. 

Improving training efficiency: The dynamic nature of aerial-ground networks 
makes it difficult for heterogeneous devices to complete collaborative computing, 
so it is difficult to improve the training efficiency of federated learning. Lu et al. 
in [37] proposed a blockchain-empowered federated learning framework operating 
in a DT wireless network that comprehensively considers DT association, training 
data batchsize, and bandwidth allocation to formulate the training optimization 
problem. Jiang et al. in [36] exploited blockchain to propose a new DT edge network 
framework and designed a joint cooperative federated learning and local model 
update verification scheme that achieves the optimal unified time. Zhang ef al. 
in [96] proposed a reinforcement of a federated learning scheme based on deep 
multi-agent reinforcement learning to optimize the training performance of federated 
learning in distributed IoT networks. Li et al. in [97] proposed a platform-assisted 
collaborative learning framework. This framework can rapidly adapt to learning a 
new task at the target edge node by using a federated meta-learning approach with 
a few samples. Existing collaborative computing needs to restart learning as the 
topology changes, which leads to the failure or slow convergence of the established 
cooperative mechanism. DT can capture a complex network topology dynamically 
and improve the efficiency of collaborative computing between devices. 
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Reducing training costs: 

It is necessary to encourage heterogeneous devices to participate in intelligent 
cooperation. Heterogeneous devices need to spend their resources and costs to train 
the federated learning model. They are therefore reluctant to participate in training 
without appropriate incentives [98, 99]. Existing incentive mechanisms can perform 
poorly due to the insufficient utilization of massive data and inaccurate modelling 
of operations in dynamic aerial-ground networks. DT can reduce the information 
asymmetry between devices by monitoring the status of devices in real time. Yang 
et al. in [100] introduced the Stackelberg game to establish an interaction model that 
comprehensively considers the data size, training time, and power consumption to 
measure the contribution to motivate client participation. Lim ef al. in [101] studied 
the incentive mechanism for federated learning in UAV-assisted IoV to encourage 
contributions from data owners, considering information asymmetry between UAVs 
and the data owners. Federated learning is data driven, and the motivation of clients 
and the quality of data they provide have an important impact on the training results 
[102, 103]. The incentive mechanism combined with DT is suitable for motivating 
heterogeneous devices to actively participate in training in aerial-ground networks. 

In summary, cross-device intelligent cooperation based on federated learning in 
an aerial-ground network still needs to be studied further. The integration of DT and 
aerial-ground networks can provide favourable support for realizing the cooperation 
mechanism of model-free, self-learning, autonomous intelligence. 
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Fig. 7.1 A DT-driven aerial-ground network system model 
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7.3 DT for Task Offloading in Aerial-Ground Networks 
7.3.1 System Model 


To realize the efficient allocation of cross-domain resources from air and ground, 
we establish a dynamic DT model for an aerial-ground network. The DT model 
can capture the time-varying demand and supply of cross-domain resources in the 
network. Deploying the DT model on devices in the network can significantly im- 
prove the environmental perception, computing efficiency, and delay performance 
of the devices. This is beneficial for the unified and efficient resource allocation and 
scheduling in aerial-ground networks. 

As shown in Fig. 7.1, we consider a DT-driven aerial-ground network in which 
vehicles act as ground devices and UAVs act as aerial devices. The network is 
composed of vehicles, RSUs, UAVs and DTs. We assume the UAVs are responsible 
for areas that are not covered by RSUs, as a supplement to the ground network. 
In such areas, vehicles and RSUs are able to deliver messages to a UAV directly 
with line-of-sight communication. With the assistance of UAVs, the vehicles not 
covered by the ground network could offload their computing tasks to RSUs to 
reduce their computing burden. We establish two DT models, including the DT of a 
group of RSUs and the DT of vehicles. Both DTs are established in UAVs to update 
the network topology and traffic load in real time and help UAVs make specific 
decisions, such as path planning. The DT of a group of RSUs can be given by 


D'={F",G",L"}, (7.1) 


where F” is a vector describing the available computing resource status of the RSUs, 
G” is the network topology between the RSUs, and L” is the network transmission 
load of the RSUs. 

The DT of vehicles can be given by 


D”={G",L",C. Q}. (7.2) 


where G" is the network topology of the vehicles, L" is the communication load 
of the vehicles, C represents the demand information of the vehicles at this time, 
and Q is the preference of the vehicles for the resource providers. The preference is 
determined by the historical service of the vehicles in a specific type of offloading 
task. 


7.3.2 Utility Function 


The DT of a group of vehicles and the DT of a group of RSUS have different utilities. 
The set of RSUs in the network is M = (1, ...,7, ..., M). The set of vehicles that 
require offloading tasks is N = (1, ...,7,..., N}. Vehicle n wants to maximize its 
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service satisfaction, which is the accumulated satisfaction it achieves from various 
RSUs. A vehicle’s satisfaction is defined as the ratio of its cumulative satisfaction 
from RSUs to the total number of resources it receives. Thus the satisfaction of 
vehicle n is given by 


2 
Qnam Pm, 
p {qn,mPm,n = =F T) 
me M 
S, = 


» (7.3) 
È Pm,n 
meM 


where f is the maximum expected value of resources from RSUs, p m,n represents the 
CPU frequency obtained by vehicle n at RSU m, and qn,m represents the preference 
of vehicle n for RSU m. 

The DT of RSUs tries to minimize energy consumption. The energy consumption 
on the RSU is related to the frequency and duration of the CPU used. We can express 
energy consumption as 


EP) = 9, >) OP ains (74) 


me M neN 


where w represents the effective capacitance parameter of the computing chipset, 
and Cy,m is the number of CPU cycles required for RSU m to calculate its tasks 
for vehicle n. The detailed resource scheduling of each vehicle is expressed as 


P - {Pm, m e MM. 


7.3.3 Distributed Incentives for Satisfaction and Energy Efficiency 
Maximization 


The goals of RSUs and the DT of RSUs are different. An RSU is designed to max- 
imize the average satisfaction of the vehicle. whereas an RSU's DT is designed to 
maximize global energy efficiency. Although these quantities are used to formulate 
the allocation scheme of computing resources, it is difficult to achieve the goal of 
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Fig. 7.2 Workflow of a game and Jacobian ADMM-based algorithm 
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minimizing total energy consumption when they have different optimal values. In 
addition, due to limited computing resources, computationally intensive centralized 
computing creates pressure for UAVs. Therefore, we propose an incentive mecha- 
nism based on the Stackelberg game and Jacobian ADMM to allocate computing 
resources, so that the DT of RSUs and the RSUs can reach a consensus on the 
allocation scheme and solve the whole problem in a distributed and parallel manner. 

Due to the complexity of solving the desired objectives of RSUs and the DT of 
RSUs, we first derive the optimization problem of RSUs and the DT of RSUs and 
then construct a Stackelberg game. We solve the average satisfaction maximization 
problem for vehicles and the global energy efficiency maximization problems for the 
DT of RSUs by using the classic ADMM and Jacobian ADMM with two blocks, 
respectively. We obtain the resource allocation schemes of the two problems (the 
DT-driven classic ADMM and the DT-driven Jacobian ADMM). Furthermore, we 
model these two problems as a complete Stackelberg game. In the game, the RSUs’ 
DT is the leader and the RSUs are the follower. According to the goals of RSUs and 
the DT of RSUs, we can formulate the Stackelberg game as 


Leader: minimize E(P) 
P 


Follower: minimize ®m(hm(Pm, Nm), 9m) 
Pm,Qm 


s.t. b3 Pmn=fmiméeM (CD), (7.5) 


neN 


where Q,, is the cumulative preference of all the vehicles for RSU m. The term 
G,,(-) includes the optimization direction of the DT of the RSUs and RSU m and 
the compensation from the DT of the RSUs. The classic ADMM with two blocks 
is powerless in this kind of convex optimization problem with high-dimensional 
variables. The Jacobian ADMM-based algorithm is able to solve convex optimiza- 
tion problems by breaking them into smaller subproblems, making each part more 
tractable. Therefore, we use the game and Jacobian ADMM- based algorithm to solve 
the problem. The algorithm flow is shown in Fig. 7.2. 

In the beginning, the DT of the RSUs, as the leader, sends the incentive parameter 
0,, to the corresponding RSU m, that is, the additional compensation of the DT of 
the RSUs to RSU m. We define the number of iterations of the outer loop as k. 
At iteration k, given incentive parameters (01,:-- ,0,,,--- , Om} from the leader, 
each RSU updates its own computing resource allocation scheme P,, in the inner 
loop, and then the leader and the follower can reach the current optimal scheme. 
At the next iteration, k + 1, the leader will adjust the incentive parameters based 
on the updated Pm, Vm € M. Then, a new current optimal scheme can be reached. 
When the outer iteration is terminated, the optimal incentive parameters and resource 
allocation scheme are the equilibrium point (0*, P*) of the Stackelberg game. The 
proposed DT-driven game ADMM minimizes global energy consumption based on 
the premise of ensuring the satisfaction of the RSUs. 
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7.3.4 Illustration of the Results 
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Fig. 7.3 Convergence of the energy consumption of all RSUs over iterations 
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Fig. 7.4 Vehicle satisfaction with RSUs over iterations under three schemes 


Figure 7.3 compares the energy consumption of three schemes, that is, the DT- 
driven Jacobian ADMM, the DT-driven game ADMM, and the scheme without 
DT, over the numbers of iterations. The scheme without DT allocates resources 
without the preference information that was obtained from the DT of the vehicles. 
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The energy consumption of the scheme without DT is the highest and remains a 
constant, because the tasks and CPU frequency can only be allocated randomly. This 
leads to a decision making and optimization process without iteration. The energy 
consumption of the DT-driven Jacobian ADMM is the lowest, since minimizing total 
energy consumption is its only objective at the cost of low vehicle satisfaction. The 
proposed DT-driven game ADMM jointly considers the overall energy efficiency 
and the satisfaction of the RSUs, and its energy consumption is thus higher than that 
of the DT-driven Jacobian ADMM. 

Figure 7.4 compares the vehicles’ satisfaction with the RSUs of three schemes, 
that is, the DT-driven classic ADMM, the DT-driven game ADMM, and the scheme 
without DT. Due to the contradictory goals of the RSUs and the DT of the RSUs, 
the DT-driven game ADMM attempts to balance between the two contradictory 
goals, and its satisfaction is a bit lower than that of classic ADMM. This is because 
RSUs allocate a great deal of resources to vehicles with high preferences, to provide 
satisfactory services for the vehicles. The satisfaction achieved by both DT-driven 
schemes, that is, the DT-driven Jacobian ADMM and the DT-driven game ADMM, 
is much higher than that without DT. This is because, in the scheme without DT, the 
preferences of the vehicles for RSUs are unknown, and the allocation cannot fully 
meet the actual requirements of the vehicles. 


7.4 DT and Federated Learning for Aerial-Ground Networks 
7.4.1 A DT Drone-Assisted Ground Network Model 


Figure 7.5 shows a drone-assisted ground network scenario consisting of drones, 
ground clients, and DTs, where the drones provide supplementary capacity for ground 
communications during natural disasters or traffic peaks. Mobile drones with a 
wide range of coverage act as servers, responsible for task offloading, global model 
updates, and so forth. A wide variety of ground equipment, such as smartphones and 
laptops, serves as clients to perform tasks and connect with drones through wireless 
communications. 

The drone serving as the aggregator cooperates with the ground equipment serving 
as the trainers to perform federated learning tasks. The drone publishes a global 
model w, which all participating clients will download. Then, each client uses its 
own private data sets to train the model and upload the new weights or gradients 
to the server. This process is conducted iteratively until the entire training process 
converges [104, 74]. 

The establishment of DT can capture the state of network elements in real time 
and effectively help the system make intelligent decisions. DT types include the DT 
of ground clients and the DT of the drone. The DTs of ground clients are deployed 
on a resource-rich ground node. The drone would maintain the DT by exchanging 
information with the ground node instead of all the clients. The set of clients in the 
network is N = {1,2,--- , N}. Client i’s DT, DTF, at time f can be expressed as 
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Fig. 7.5 The architecture of a DT-empowered aerial-ground network 


DT; (t) = {F} (w), bilt), fi(t)). (7.6) 


where w denotes the current training parameter of client i, F’ (w) represents the 
current training state of client i, b;(t) represents the packet loss rate, and f; (t) is the 
CPU frequency of the client at time f. 

Due to the deviation of DTs, the packet loss rate deviation 6,(t) and the CPU 
frequency deviation f(t) can be measured as the errors of the DT mapping in the 
communication environment and computing power, respectively. For client i, the 
calibrated DT is 


DT; (t) = (F1 (o), bi(t) + bir), AO + fi}. (7.7) 


The DT of the drone manages the deviation of the DTs of the clients and has a 
preference for the clients. Drone j’s model is 


DT; (t) = (P (0. DD}, (7.8) 


where P (t) is the reputation distribution of nodes within its coverage area, and fO (1) 
is the set of deviations between the client's local update and the global update. 


7.4.2 Contribution Measurement and Reputation Value Model 


Update significance can intuitively measure the contribution of a local model update 
to the global model update. The update significance is measured by the model 
deviation d7, which is the divergence of a particular local model from the average 
across all local models. A small d7 reflects a high quality of upload parameters of 
client i. The aggregator updates the value of d? for client i in each time slot, as a 
basis for the quality evaluation of the parameters submitted by client i. 


100 7 Digital Twin for Aerial-Ground Networks 


The reputation of a client can also affect the training process. Through the rep- 
utation model, high-performance clients should be identified in terms of sufficient 
communication resources, powerful computing capabilities, and accurate training 
results. We use P = (pi, p2,::: , pw) to represent the reputation value of each 
client. According to subjective logic, the reputation value model is related to the 
communication capability of node 7 during the tth global update and the learning 
quality d7. 


7.4.3 Incentive for Federated Learning Utility Maximization 


Static and dynamic incentives are designed for small-scale networks and large-scale 
networks, respectively [37]. In a small-scale network, a single drone can cover all 
the clients. Therefore, we first design a static incentive mechanism. The term 7; 
represents the decision of client i, that is, the number of rounds in which the client 
participates in the global update; 7 = (T1, T2,--- , TN) represents the strategies for 
all the clients; and T-; = (11,--- , Ti-1, Ti«1,7 ^ , TN) denotes the training strategies 
of all the clients except for client 7. Given the computing cost per round (a complete 
global update round) C = (c1, c5,:-- , cw) and the communication cost per round 
K = (ki, ko,:-- , kw), the static incentive utility function is the difference between 
the reward and loss of client i, which can be defined by 


piti 


U;(rj,T-D = ———— 
i(Ti, Ti) y pit} 
JEN 


R- EC; — Tiki. (7.9) 


The utility function of the aggregator is the total energy consumption of clients 
in the learning process minus the payment of the aggregator. The static incentive 
utility function is defined as 


Uo(R) = X| piici - aR’, (7.10) 
iEN 


where a > 0 is a system parameter to ensure that the utility is greater than or equal 
to zero under the optimal R*. 

In a large-scale case, it is difficult for a single drone to cover the entire area. 
Therefore, a dynamic incentive mechanism can be designed to select the optimal 
clients in adaptation to the time-varying environment. The difference from the static 
incentive is that C in the dynamic incentive represents the computing cost of the 
client to complete a round of local training. In the dynamic scene, we use r* instead 
of R in the formula, where r* represents the reward determined by the drone before 
the rth global model is updated. For convenience, in the following analysis, we 
uniformly use R to express the reward. 

The decision making problem can be modelled using the Stackelberg game. In 
the game, the DT of the drone acts as the leader, while the ground clients are the 
follower. The game consists of two stages. In the first stage, the aggregator publishes 
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the task and determines its reward R. In the second stage, each client will devise 
strategies to determine the number of rounds to participate in federated learning and 
maximize their respective utilities [105]. The second stage of the Stackelberg game 
is a noncooperative game, that is, in which there is a Nash equilibrium. A set of 
strategies 7* = (15,75, , Ty) is a Nash equilibrium in the second stage of the 
game if, for any client i, U;(17, 1*;) = Uj;(ri, 1*;), Vr; > 0. Under the reward R 
given by the aggregator, no client can gain any additional benefits by unilaterally 
changing the current strategy. 

According to Nash equilibrium, when all the other clients expect client i to play 
their best strategy, client i can only play v7. Therefore, we need to introduce the 
concept of the best response strategy. Given 7. ;, a strategy is client /'s best response 
strategy, denoted by 5; (T. .;), if it maximizes U;(7;, T_;) over all t; > 0. To find the 
Nash equilibrium in the second stage of the game, a closed-form solution of the 
best response strategy for each client must be calculated. Accordingly, if the whole 
game has a unique Stackelberg equilibrium, the necessary and sufficient condition 
is for there to be a unique optimal solution in the first stage of the game. There 
exists a unique Stackelberg equilibrium (R*,7 *), where R* is the only value that 
can maximize the utility of the aggregator over R € [0, oo). The utility function of 
the aggregator is a concave quadratic function on the difference between the reward 
and loss of client 7, and the first derivative of the utility function is equal to zero. 
Then the optimal R can be solved. At this time, (R*,7 *) is the unique Stackelberg 
equilibrium in the game. 

Different from the static mechanism, the dynamic mechanism selects clients 
according to the ratio of the unit local training computing cost and reputation value, 


Tg 

that is, F The drone's optimal payment R* should be expressed as R* = >} (r7)*, 
i T-l 

where Tg is the number of rounds of the global update. Finally, t? and (r*)* constitute 


the unique equilibrium of the Stackelberg game. 


7.4.4 Illustration of the Results 


We use the software Pytorch 0.4.1 to build a federated learning model in an air- 
ground network and use the classic Modified National Institute of Standards and 
Technology data set to evaluate the performance of the proposed incentive mech- 
anisms. We set up a total of 10 to 100 clients. Under the dynamic incentive, the 
communications range of a drone can only cover 20 clients at the same time. We 
employ a cost-only scheme as the benchmark where clients with low training costs 
are selected to participate in federated learning. 

Figure 7.6 shows the model's accuracy with varying global update rounds under 
three schemes. The convergence accuracy of the global model relies on the par- 
ticipating clients and their data quality. The accuracy under the dynamic incentive 
scheme is the highest. After each round of global updates, the performance of the 
clients will be evaluated, and the participation of low-quality clients will be reduced. 
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Fig. 7.6 Comparison of model accuracy under varying global update rounds 


180 
LAM. 
P d » 

E 1704 T s--9*--e. ^x. 
4 - x ^ 
D e-e 7 
E pod Š "- X 
Z 160} g X X NL CN 
S v Ne EN Y 
8 r Se NGA 
A 1504 `~ Xx 
T la Na MS 
5 —€- Benchmark wes v 
E uo] —*- Static incentive `e, ^. 

—*-- Dynamic incentive e 


10 20 30 40 50 60 70 B0 90 100 
Number of clients 


Fig. 7.7 The total social welfare of a drone and clients varies with the number of clients 


The static scheme chooses the optimal client set, which might not be appropriate 
later in the federated learning process due to the mobility of the drone. Thus, the 
accuracy of the static incentive is lower than that of the dynamic incentive. Since the 
benchmark considers only the training costs of the clients, its model accuracy is 5% 
lower than that of the static scheme. 

Figure 7.7 compares the total social welfare of the drone and clients varies with 
the number of clients under three schemes. As the total number of clients increases, 
the total social welfare increases first, peaks at around 40 clients, and then decreases. 
With the increase of the client number, the utility of the drone increases, while the 
utilities of the clients decrease due to the greater number of competitors. In addition, 
the benchmark social welfare is higher than that of the static incentive, because the 
benchmark selects only clients with low cost. Thus, its social welfare is the highest 
among the three schemes. 
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Chapter 8 
Digital Twin for the Internet of Vehicles 


Abstract As a working combination of smart vehicles, advanced communication 
infrastructures, and intelligent transportation units, the Internet of Vehicles (IoV) has 
emerged as a new paradigm for safe and efficient urban life in the future. However, 
various types of smart vehicles with distinct capacities, diverse IoV applications with 
different resource demands, and unpredictive vehicular topology pose significant 
challenges to fully realize IoV systems. To cope with these challenges, we leverage 
digital twin (DT) technology to model complex physical IoV systems in virtual space, 
to identify the relation between application characteristics and IoV services, which 
facilitates effective service scheduling and resource management. In this chapter, 
we discuss the motivation, benefits, and key issues of applying DT in IoV systems. 
Then, we use vehicular edge computing and caching as two typical IoV application 
scenarios to present DT-empowered task offloading and content caching scheduling 
schemes and their performance. 


8.1 Introduction 


Vehicles are undergoing a fundamental shift, from simple transportation units to 
smart ones empowered with environmental sensing, autonomous driving, and in- 
formation interaction capabilities. Integrating such smart vehicles with pedestrians 
and the infrastructures around them gives rise to Internet of Vehicles (IoV) systems, 
which provide a range of powerful vehicular applications and lead to pioneering 
advances in safety and the efficiency of intelligent transportation. For instance, IoV 
helps to deliver information gathered from the urban traffic environment to adjacent 
vehicles for safe navigation and traffic management. In addition, IoV can provide 
real-time information and interactive entertainment for vehicle occupants. 

The development of IoV technology has received much attention in recent years, 
and high expectations have been raised about the benefits that its application will 
bring, prompting researchers and engineers to engage in in-depth discussions on 
possible obstacles in the IoV evolutionary graph. 
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The key feature of IoV is its massive connections and dynamic topology. As we 
mentioned, IoV is a network consisting of vehicles, drivers, pedestrians, roadside 
units (RSUs), and other intelligent units participating in traffic applications, commu- 
nicated in vehicle-to-vehicle (V2V), vehicle-to-RSU (V2R), vehicle-to-person, and 
vehicle-to-sensor modes. The mobility of vehicles and pedestrians can cause drastic 
changes in data transmission performance and even the interruption of communi- 
cation links. The large scales of connected units and time-varying communication 
associations make IoV characteristic modelling and operation management seriously 
complex and difficult. 

Another issue worth considering is the ultra-low latency constraints of some IoV 
applications. For example, in vehicle driving, when the vehicle in front brakes in an 
emergency, the following autonomous vehicle needs to complete the braking action 
within a few milliseconds according to the detected vehicle distance or the warning 
notification sent by the front vehicle. To meet such a strict delay constraint, the vehi- 
cle’s control, environmental perception, vehicular communication, and information 
processing must be comprehensively coordinated. 

The last issue to be addressed is closely related to the previous one. Different 
types of IoV applications rely on different forms of cooperative services from het- 
erogeneous resources. For example, vehicular augmented reality needs to consume a 
great deal of computing and sensing resources, while onboard interactive entertain- 
ment mainly relies on communication and storage resources. Furthermore, synergy 
and competition exist between heterogeneous resource services. For instance, the 
premise of data processing is that the data can be transmitted to the corresponding 
processor node by communication resources, which may be in contention due to mul- 
tiple vehicle communication pairs. The complex relation between these resources 
makes it challenging to efficiently implement IoV applications. 

Several technical approaches to the above challenges have emerged, with Digital 
twin (DT), in particular, showing promise. By mapping physical IoV networks to 
virtual space, DT helps improve IoV application performance and resource efficiency. 
Some of the main benefits provided by DT to IoV are shown in Fig. 8.1 and are listed 
below. 

Accurate mapping and unified modelling: In DT-empowered IoV networks, DT 
servers collect road traffic status and application service characteristics from sensors 
installed on smart vehicles and through communication facilities spread throughout 
the vehicular network, to construct a real-time and accurate reflection of physical 
IoV networks. Since a reflection model in virtual space is represented by multidi- 
mensional digital parameters, irrelevant physical difference between various types 
of vehicles can be shielded by normalizing the feature parameters, to build a unified 
model that enables modelling interaction and migration. 

Feature digging and trend prediction: In the process of autonomous driving and 
on-board application services, vehicles can consume various resources, such as urban 
roads, vehicular communications, and edge computing. Therefore, collaboration, 
competition, and even social associations among multiple vehicles are generated. 
DT reflection helps to explore such potential features and relations in IoV systems. 
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Fig. 8.1 Benefits provided by DT to IoV 


Going a step further, based on these relations, DT can predict future physical actions, 
states, and events in IoV systems, such as possible traffic congestion or collisions. 

Digital-physical two-way interaction: There is a two-way interaction between 
the DT model and the real physical entities of the IoV system. On the one hand, 
physical entities determine the digital mirroring. On the other hand, digital models 
logically guide physical action strategies. Both model accuracy and physical strategy 
performance can be improved during this iterative evolution process. 

Not restricted by time, space, or resources: In physical IoV networks, the safety 
predictions of vehicle driving behaviour, inter-vehicle communications, and resource 
cooperation between vehicles are restricted by event sequences, wireless transmission 
distances, and vehicle resource capacities, respectively. However, in the DT image 
of IoV, these constraints can be broken. For example, by dynamically changing the 
timeline, retrospective determinations and predictions of traffic events are convenient 
to make. In addition, in virtual space, communications between inaccessible vehicles 
can be realized by data sharing between vehicle model processes in a DT server. 

Motivated by the potential benefits of DT technology, a few works have ad- 
dressed the incorporation of DT into IoV systems. In [106], the authors leveraged 
DT to facilitate collaborative and distributed autonomous driving. Based on vehicle 
DT models, driving decisions can be obtained at low cost. In [107], two DT models 
of vehicle driving states based on a Gaussian process and deep convolutional neural 
networks were respectively established that provide a scheme for the optimization 
of vehicle driving states and the realization of DT entity interactions. The authors 
in [108] introduced a DT-enabled edge intelligent cooperation scheme that guides 
optimal edge resource allocation and edge intelligent cooperation. Combining DT 
with vehicle-to-cloud communications, the authors in [109] presented a cooperative 
ramp merging system for connected vehicles that allows merging vehicles to cooper- 
ate with others prior to arriving at a merging zone. In [110], the authors focused on 
the security issues of cooperative intelligent transportation systems and constructed 
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a DT model based on convolutional neural networks and support vector regression. 
Aided by the DT model, system security prediction accuracy was improved. 

Despite much promising recent work in the area of DT-empowered IoV, several 
questions remain open for further investigation, and are discussed below. 

Delays in DT modelling: Traffic safety is an important application scenario of DT- 
enabled IoV, in which some functions, such as early warnings of upcoming traffic 
accidents and adjustments of vehicle driving behaviour, have strict delay constraints. 
Meeting these constraints requires the DT model for the traffic environment and 
vehicle state to be constructed in a short time and to remain updated in real time. 
Considering the highly dynamic IoV topology and massive amounts of connected 
IoV nodes, the maintenance and tracking of such a complex system in real time are 
a challenge. 

Efficiency in DT modelling: Following the previous challenge, to reduce DT 
modelling delays, many resources need to be allocated for vehicular environment 
sensing, state information delivery, and modelling processing. However, in addition 
to serving in the construction and update of DT models, constrained IoV resources are 
also used to support vehicular communication, autonomous driving, and onboard 
multimedia applications. How to reduce the resource costs of DT modelling and 
improve DT efficiency has become an important issue to be investigated. 

Fault tolerance in DT modelling: The last but not least question concerns fault 
tolerance in DT modelling. Due to a limited sensing range, vulnerable wireless 
transmission parameters, and poor modelling processing power, established IoV DT 
models can have errors. These errors can seriously affect the control of vehicles’ 
driving action and mislead the prediction of road traffic trends, thereby undermining 
the safety and efficiency of road traffic. In a harsh IoV environment, how to construct 
a DT model with high fault tolerance is still an unexplored problem. 


8.2 DT for Vehicular Edge Computing 


Driven by advances in vehicular communication and sensing and processing capa- 
bilities, many powerful IoV applications have emerged, such as autonomous driv- 
ing, smart logistics, and driving augmented reality. However, the implementation 
of these applications requires intensive computation for environmental information 
processing and obtaining traffic behaviour under strict delay constraints, posing great 
challenges for vehicles with limited onboard computing resources. 

VEC, which enables computing resource sharing at the edge of vehicular net- 
works, is an appealing paradigm for meeting the intensive computation demands. 
In VEC, resource-hungry vehicles can offload their computing tasks to other smart 
vehicles or an RSU with spare computing power. However, to achieve efficient task 
offloading in such a dynamic and complex IoV environment, key issues still need 
to be addressed. For example, the communication scheduling for task data deliv- 
ery is closely related to the computing resource management for task processing, 
which makes task offloading complicated. Moreover, resource competition between 
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different offloading vehicle pairs, as well as the time-varying topology of vehicular 
networks, introduces further unprecedented challenges in managing VEC. 

Recent advancements in machine learning provide significant capabilities to an 
aware dynamic IoV environment, determine action strategies, and tackle complex 
problems that rely in VEC applications. However, the effective implementation of 
the learning approach always relies on accurate and real-time system information 
gathered by learning agents. In vehicular networks characterized by massive amounts 
of connected smart vehicles, a highly dynamic topology, and a limited wireless 
spectrum, it is impractical to form a centralized artificial intelligence (AI) manager 
that schedules edge services for the entire network. To address this problem, we turn to 
multi-agent distributed learning empowered vehicular edge management. However, 
efficient collaboration and joint decision optimization among these multiple agents 
still face critical challenges. 

DT is a promising technology to address these challenges. DT’s state mapping 
between real and virtual dimensions provides users with comprehensive insights 
into the investigated system and dramatically reshapes the design and engineering 
process. Merging DT with machine learning will generate great benefits. On the one 
hand, DT provides AI with comprehensive and accurate system state information, 
which is exactly what learning processes require. On the other hand, AI provides 
much intelligence to DT, making its information collection and system description 
smart and efficient. 

In this section, we propose a new VEC network based on DT and multi-agent 
learning that improves agent collaboration and optimizes task offloading efficiency 
[72]. In this network, DT is leveraged to reveal the potential cooperation between 
different vehicles and adaptively form multi-agent learning groups, which reduces 
learning complexity. Moreover, we design a distributed multi-agent learning scheme 
that minimizes vehicular task offloading costs under strict delay constraints in com- 
plex vehicular networks and dynamically adjusts the state-mapping mode of the DT 
network (DTN). 


8.2.1 System Model 


Figure 8.2 shows the framework of a DT-empowered VEC network. There are N 
smart vehicles on the road. These vehicles are equipped with computing power to 
process tasks and perform learning functions. The computing capability of vehicle i, 
i € N, is denoted as f; CPU cycles per second. To enable powerful vehicular appli- 
cations, such as autonomous driving and onboard entertainment, vehicles generate 
various types of tasks to be processed. Without loss of generality, we consider vehicle 
i to have J; types of tasks, and task w;,; is described in the form of three elements, as 
wi j = (Cu, Dij. TP. Here, Cj, ; is the amount of computing resources required 
to execute the task, D;,; presents the size of the task input data, and TP is the 
maximum delay that task w;,; can tolerate. 
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Fig. 8.2 A DT-empowered VEC network 


Since different vehicles have diverse computing capabilities and task processing 
requirements, parts of the vehicles can have sufficient computing resources, whereas 
others are lacking. Through V2V communication, one vehicle can offload its tasks 
to others. We call the target vehicles “vehicular edge servers". Let §;,;,, = 1 denote 
vehicle i, which offloads its task j to vehicular server k, and £;,j,x = 0 denotes when 
the vehicle does not offload task j to server k. The time consumed to complete task 
Wi, j is divided into two parts, namely, the offloading task transmission time and the 
task execution time. The transmission time of task w;,; from vehicles i to k through 
channel / is shown as T = Dj,;/Ri,x,1, where R;,x,1 is the transmission rate. 

A target vehicular server can receive multiple tasks from the other vehicles, and 
it puts these tasks in a queue. Taking into account task delay constraints, the target 
server executes the tasks in order according to the length of remaining time, from 
shortest to longest. Consequently, a task’s execution time consists of the waiting time 
in the queue and the time processed in the CPU. The execution time of w;,; can be 


presented as 
N Je 


TO. = 2,15 STIS Mg aCe yl fo, (8.1) 
i=l] j'zl 
where 1{%} is an indicator function that equals one if € is true, and zero otherwise, 
and IT is the remaining time of task w;,; before the deadline. 

To improve vehicular computing resource utilization, a price-based incentive 
mechanism is incorporated into the resource scheduling. For a vehicular server, the 
weaker its computing power, the greater the resource demands of its queuing tasks, 
the tighter the tasks' delay constraints, and the higher the price of resources providing 
for guest tasks. We denote the price of a unit of computing resource of vehicle i as 
Zi. 

In the vehicular edge system, a DTN continually maps the vehicles’ physical states, 
such as the communication topology and computing resource demands, to virtual 
digital space. With the help of the DTN, edge service optimization and resource 
allocation strategies can be efficiently obtained. 
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8.2.2 DT and Multi-Agent Deep Reinforcement Learning for VEC 


Merged with DT, AI learning gains comprehensive state information and effective 
guidance for agent learning, while helping DT to accurately model the physical 
system. We investigate the incorporation of DT and multi-agent learning in VEC 
networks and propose optimal edge service scheduling schemes. The main frame- 
work of these schemes is shown in Fig. 8.3. 
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Fig. 8.3 Incorporation of DTN and multi-agent learning for VEC 


Owing to the large-scale distribution of massive numbers of vehicles, it is costly 
and impractical to globally schedule the task offloading of the whole edge network. 
To address this issue, we leverage a DTN and gravity model to design an edge 
service aggregation scheme that efficiently aggregates vehicles based on the potential 
matching relations between the supply and demand of computing resources and 
greatly reduces the complexity of task offloading scheduling. 

To guide the edge service aggregation, DTNs of the vehicular edge network are 
constructed in the RSUs. A DTN can be regarded as a combination of logical models 
and parameters recorded in digital space to characterize the states of the objects in 
physical space. We define the element of a DTN as D, = {M,®,a@}. Here, M 
denotes the digital model of the vehicles in the physical system, which is described 
by a vehicle task set {w;,;}, a computing capability set {f;}, a resource price set 
(zi), and an available transmission rate set {R;,;}. The modelling parameters are 
® = {¢1, 62, 63}, which reflect the importance of the three factors of resources, 
pricing, and communication in the DTN modelling, respectively. The values of 
the parameters update periodically, and w is the sequence number of the mapping 
periods. 

With the aid of the DTN, we develop a gravity model-based vehicle aggregation 
scheme. Here we reform the gravity model and make it suitable to characterize the 
supply and demand relations of the vehicular edge service. The gravitation in the 
service association between vehicles i and i’ is calculated as 
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According to the gravitation obtained in (8.2), we split the vehicles into multiple 
aggregation groups, which are denoted as {V}. Based on this aggregation, we 
leverage a multi-agent learning approach to optimize edge resource allocation. Since 
the vehicles in the edge network have computing and communication capabilities, 
they can act as agents to learn the optimal edge scheduling strategies. To minimize 
the task offloading costs under delay constraints, the optimization problem is given 
in the following form: 
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where constraint C1 ensures that a task can be offloaded at most to only one vehicle for 
processing, C2 indicates that task offloading occurs only between vehicles belonging 
to the same aggregation group, and C3 shows that the time consumption, including 
the transmission and execution time, should be within the delay constraints of the 
tasks. Problem (8.3) is an integer programming problem and has been proved to be 
NP complete. 


Let Uv, = i pe one 1 Bi j,k X 1 Ôi, jk, Ci, jzi. The target function of (8.3) 
can be written as min jj ey U According to C2, there is no offloading correla- 
tion between the different aggregation groups. Thus, to address problem (8.3), we 
turn to minimize Uy, by adopting a multi-agent deep deterministic policy gradient 
(MADDPO) learning approach, where V, € V. The number of learning iterations 
is represented by the time slot t. For vehicle i belonging to aggregation group Vg, its 
action taken at time slot f is ai = - (B; dde OF ka where i, k € V4, J € Ji and! € L. 
Then, the action set of the multiple agents is given as A’ = {a‘}. The state at time 
slot t can be presented as S* = i. T? }, where 777" and T? are the remaining 
completion time of the task w;,; and the set of tasks that have been queued for 
processing in vehicle k in time slot t, respectively. Taking action A’ in state S’, the 
learning system of V, gains the reward 
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The main goal of multi-agent learning in group Vg is to find the optimal action 
strategy for the agents to minimize the group’s task offloading costs, presented as 
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where £ is a discount coefficient that indicates the effect of a future reward on the 
current actions, and 0 < £ < 1. 

The DTN and the multi-agent learning system operate cooperatively in scheduling 
the vehicular edge service. On the one hand, the DTN determines the distributed 
learning environments of the multiple agents by aggregating vehicular groups under 
the guidance of the parameters ® = {¢1, 62, $3). This aggregation improves the 
supply and demand matching of edge resources and reduces the multi-agent learning 
complexity. On the other hand, the multi-agent learning results, that is, the task 
offloading target selection and edge resource allocation, affect the vehicular edge 
service performance and the performance indicators can be used in turn to evaluate 
the pros and cons of the aggregation mechanism, to adjust the aggregation parameter 
set ®. These two parts iteratively interact and update to adapt to the changes in 
application scenarios. 


8.2.3 Illustrative Results 


We evaluate the performance of our proposed vehicular edge task offloading schemes 
based on real traffic data sets, which are extracted from the historical mobility traces 
of taxi cabs in the San Francisco Bay area. There are approximately 500 cabs, and the 
average time interval for their GPS coordinate updates is less than 10 seconds [112]. 
To investigate the influence of traffic environment characteristics on the offloading 
scheme performance, we further divide the Bay Area into six square areas. We 
consider a scenario in which the computation capacities of the vehicles are randomly 
taken from (10, 20) units. The computation resource requirements, data size, and 
maximum tolerable latency of the tasks are randomly chosen from (30, 50) units, 
(5, 10) MB, and (0.5, 2) seconds, respectively. In addition, there are five orthogonal 
channels for offloading transmissions, and the bandwidth of each channel is 0.3 
MHz. 

Figure 8.4 shows the offloading costs under different scheduling schemes. Com- 
pared with the other two schemes, our proposed MADDPG obtains the lowest cost. In 
the independent learning scheme, each vehicle works as an agent to aware edge ser- 
vice environments and makes self-interested offloading actions without interactions 
among the agents. This independent decision-making approach can create a resource 
surplus or shortage between some vehicular service pairs, thereby undermining the 
offloading efficiency of the whole system. In the MADDPG without aggregation, 
all the agents in the same area adopt joint decision making. Due to the complexity 
of the vehicle topology and potential service relations, in this scheme, it is difficult 
to reach the optimal offloading strategy under constrained learning iterations. In 
contrast to the previous two schemes, the MADDPG scheme aggregates vehicular 
agents based on DTN-aided edge service matching, which helps the scheme to real- 
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ize low-complexity multi-agent collaborative learning under the premise of efficient 
resource utilization and obtains the lowest cost. 
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Fig. 8.4 Comparison of offloading costs with different schemes 


Figure 8.5 presents the convergence of the MADDPG learning scheme. We ran- 
domly select two agents from areas 3 and 5, respectively. All the agents’ learning 
converges around 3,300 iterations. Furthermore, this figure demonstrates that the 
difference in edge network characteristics and aggregation groupings between the 
two areas has little effect on the convergence performance. 
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Fig. 8.5 Convergence of MADDPG learning 


8.3 DT for Vehicular Edge Caching 


Along with the proliferation of smart vehicles and powerful IoV applications, the 
huge amounts and high diversity of content need to be disseminated and shared be- 
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tween interactive vehicles under stringent delay constraints. However, due to limited 
spectrum resources, it is challenging for current wireless systems to deliver content 
while meeting such requirements, especially in heavy traffic scenarios with high 
vehicle density. 

Vehicular edge caching is a promising paradigm for addressing this challenging 
issue. Edge caching technology locates popular content close to end users via dis- 
tributed cache vehicles and RSUs and considerably accelerates the responsiveness 
of content acquisition from the edge, compared to fetching them from remote con- 
tent providers. However, unstable communications and the highly dynamic topology 
between smart vehicles and RSUs still pose critical challenges in designing optimal 
caching schemes for vehicular edge networks. In practice, an individual edge cache 
server always has constrained storage space, which makes it impossible for a single 
server to hold multiple large files at the same time. Moreover, when the cache servers 
are equipped on several RSUs, the limited coverage range of an individual RSU can 
lead to short communication durations and small amounts of data delivered. 

To effectively utilize the constrained cache and communication resources with 
dynamic topology, cooperative caching needs to be leveraged, where content sub- 
scribers can be served by multiple caching servers. Moreover, to make full use of the 
caching capabilities of smart vehicles, social interactions among the vehicles can be 
utilized to improve content dispatch efficiency. The social characteristics of the ve- 
hicles are basically related to their drivers, who determine their content preferences 
and daily driving routines and affect the other vehicles that may be encountered on 
the road or in parking lots. 

Integrating socially aware smart vehicles and the mobile edge computing frame- 
work also requires addressing the challenges brought about by socially aware smart 
vehicles. For instance, vehicular social characteristics are time varying and can 
change dynamically according to content popularity, traffic density, and vehicle 
speeds. Furthermore, owing to the mobility of vehicles, highly intermittent connec- 
tivity between vehicular content providers and subscribers can seriously undermine 
the efficiency of socially aware content transmission. In addition, the cooperation 
between vehicular cache resources needs to cater to road traffic distribution, channel 
quality, and content popularity. Thus, supporting delay-bounded content delivery 
over vehicular social networks with multiple cache-enabled smart vehicles is a chal- 
lenge. 

DT technology can be used to address the above challenges. In socially aware 
vehicular edge caching networks, the DT approach can enable cache controllers to 
grasp the social relations between vehicles, understand the vehicle flow distribution, 
and effectively allocate communication and storage resources for content delivery. 
In this section, we propose a DT-empowered content caching mechanism for socially 
aware vehicular edge networks [111]. We present a DT-based vehicular edge caching 
framework that comprehensively captures vehicular social features and improves 
caching scheduling in highly dynamic vehicular networks. Moreover, by applying 
a deep deterministic policy gradient (DDPG) learning approach, we propose an 
optimal vehicular caching cloud formulation and edge caching resource arrangement 
that maximize the system’s utility in diverse traffic environments. 
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8.3.1 System Model 


Figure 8.6 shows a DT-empowered vehicular social edge network. We consider an 
intelligent transport system in urban areas, where smart vehicles provide various 
powerful applications, such as smart navigation, online video, and interactive gam- 
ing. The implementation of these applications always requires content generated by 
the data centre, which is located in the core network. The required content is classified 
into G types. Each type of content is described in three terms, as T, = { fg, t2, Mg}, 
and g € G, where f, is the size of content type g, tẹ®* is its maximum delay toler- 
ance, and 44, is the delay sensitivity coefficient that can be taken as the utility gained 
from a unit time reduction compared to /7** during the content delivery process. 
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Fig. 8.6 A DT-empowered vehicular social edge network 


To form access networks and provide data to vehicular content subscribers, N 
RSUs are located along bidirectional roads that can receive content from the data 
centre and then relay it to the vehicles. The diameters of the regions covered by these 
RSUs are (L1, L2, ..., Ly }, respectively. Each RSU is equipped with an edge caching 
server. The caching capabilities of these servers are (C1, C», ..., Cy }, respectively. 
To avoid long transmission latencies between the data centre and the vehicles, the 
servers can retrieve popular content from the centre and store them in their cache for 
later use. 

Besides being cached in RSUs, content can also be pre-stored in smart vehicles. 
Cache-enabled smart vehicles on the road act as content carriers and forward cached 
data to vehicles they encounter through V2V communication. To fully exploit V2V 
content delivery, vehicular social relations are leveraged in edge cache management. 
When the supply and demand content between vehicles is consistent and the com- 
munication link for data delivery can be established, we say that the vehicles are 
socially related. From this viewpoint, vehicular social relations are characterized 
by two elements. One element is the content matching between the supply and de- 
mand sides, and the other is the communication contact rate of the vehicles. We 
consider that the vehicles in this system demand G types of content with probability 
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B = (Bi. b2, ..-, BG}, respectively, where X seg Bg < 1. When a vehicle with type 
g content in its cache is on the road, the probability of encountering a vehicle that 
needs exactly this type of content is 85. Thus, the content-matching element can be 
described by the probability 8. The communication contact rate is defined as the 
number of vehicles with which a given vehicle can be associated in a unit time while 
it is driving. 


8.3.2 DT-Empowered Content Caching 
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Fig. 8.7 DT and LSTM-based social model construction 


Figure 8.7 illustrates the main framework of the proposed DT and a social model 
construction approach based on long short-term memory (LSTM). The DTN con- 
sists of five modules, where the information collection module obtains vehicular 
network states from smart vehicles through V2R communication. The control mod- 
ule determines the update cycle and adjusts the data type and interactive frequency in 
information collection. The adjustment will be issued to the smart vehicles through 
the instruction output module, thereby changing the vehicles’ state sampling and 
reporting mode. After establishing the DT, which offers a virtual representation of 
the physical vehicular network, we use an LSTM recurrent network to extract the 
social features from the received data sets. 

We use Wg (Eg) to denote the accuracy of the social model that reflects the relations 
between the supply and demand of vehicles for type g content, where £, is the amount 
of system information gathered by DT to train the LSTM network and obtain the 
social model (8, s1, 52}. The value of v, (£4) is the modulus ratio of the estimated 
social model parameters to those of the true model, and 0 € y4(£,) < 1. Since more 
information would help improve the model’s accuracy, v; (£4) is a monotonically 
increasing function in terms of £,. 

In the proposed vehicular edge caching network, to improve the delivery time 
efficiency while reducing transmission costs, the content needs to be efficiently pre- 
stored in appropriate cache nodes. Moreover, as the caching arrangement depends on 
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the vehicular social model obtained from the DT-empowered LSTM system, the more 
information gathered by the DTN, the higher the accuracy of the model. However, 
the information collection process incurs a V2R communication cost. Thus, the 
trade-off between the V2R communication cost and model accuracy and its impact 
on the caching system utility also need to be considered in the cache scheduling. 

Let x, and M, = {Yg,1; Yg,2» +.» yg, v) denote the probability of pre-storing type 
g content in the vehicular caching cloud and in the caching servers equipped on 
RSUs, respectively. The size of the content segment cached in a vehicle is Qg. The 
proposed optimal edge caching problem, which maximizes the utility of the caching 
system under the constraints of node cache capacity and content delivery delay, can 
therefore be formulated as 
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where V; and V» denote the sets of the content provider and subscriber vehicles 
in an area, respectively; V, (£4) is an influence function that presents the impact of 
social model deviation caused by different amounts of gathered information on the 
system's utility; and wg, is the probability that vehicle v’ is located within the 
coverage of RSU n and obtains type g content from the cache server equipped on 
this RSU in V2R mode. 

In (8.6), the first two constraints show the range of the caching probability. 
Constraints C3 and C4 guarantee that the amount of content on a vehicle and on 
an RSU server should not exceed the maximum storage capacity of the respective 
caching node. Constraints C5 and C6 ensure the time cost for type g content remains 
within its delay constraint. Constraint C7 indicates that the size ofthe content segment 
cached in a vehicle should not exceed the upper limit. The last constraint ensures that 
the amount of information related to type g content is positive and the total amount 
of gathered information should not exceed the maximum threshold €™*. 

In the proposed optimal caching problem, the edge cache scheduling relies on the 
social model built, while, in the model construction, the adjustment of information 
collection depends on its effect on the system's utility. Moreover, due to possible 
content segmentation and cache resource sharing, there exists strong correlation 
between the various types of content cached in heterogeneous edge caching nodes. 
These features make solving problem (8.6) a critical challenge. To address this 
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issue, we propose a DDPG learning-based iterative approach. In each iteration, we 
first obtain the cache scheduling strategies according to a given social model and 
then modify the amount of information gathered in model construction based on 
the determined caching strategies. The iteration continues until the system’s utility 
converges. 


8.3.3 Illustrative Results 


We evaluate the performance of the proposed DT-empowered and socially aware 
edge caching schemes based on vehicular traffic data sets gathered in different areas. 
We consider a scenario in which one to three RSUs are randomly located in each area. 
The data storage capacity of the cache server equipped on each RSU is randomly set 
within the interval (300, 700) MB. There are 10 types of content requirements, of 
which the content size, maximum delay tolerance, and delay sensitivity coefficient 
are randomly chosen from (10,100) MB, (0.5, 3) seconds, and (0.1, 0.3), respectively. 
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Fig. 8.8 Comparison of the caching utilities of multiple areas under different schemes 


Figure 8.8 compares the utilities of multiple areas with different edge caching 
scheduling schemes. Our proposed DT-empowered learning approach gains the high- 
est utilities in all the urban areas compared to the others. Here, the greedy approach, 
which obtains the lowest utility, arranges the content storage in the edge cache nodes 
only according to content popularity and ignores the social relations between smart 
vehicles and thus fails to make full use of the communication contacts between vehi- 
cles to implement V2V data delivery. In contrast to this approach, the socially aware 
learning scheme takes the content delivery among vehicles directly into account 
and dynamically allocates cache and communication resources based on the content 
requirements and known environmental characteristics, thus achieving higher utility. 
However, its social feature perception mode is fixed, which can increase detection 
costs or reduce perception accuracy. Unlike the two previous schemes, the one we 
proposed leverages DT to reflect the vehicular network states while adaptively ad- 
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justing social model construction strategies with balanced accuracy and costs, thus 
resulting in the highest caching utility. 
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Fig. 8.9 Comparison of the content acquisition delays of multiple areas under different schemes 


Figure 8.9 compares the content acquisition delays under different schemes in 
multiple areas. Our proposed DT-empowered learning scheme outperforms the other 
two approaches. Since this scheme smartly utilizes vehicular social relations and 
caching capacity in enabling direct data delivery between vehicles, the content 
acquisition delay is reduced. It is worth noting that, although in a few areas, such 
as area 3 in Fig. 8.8 and area 4 in Fig. 8.9, the performance of the DT-empowered 
learning scheme is close to that of the socially aware learning approach, in all areas 
as a whole, the utility (delay) of the DT-empowered scheme is increased (decreased) 
by 17% (10%), on average, over the simple socially aware scheme. Since both 
these schemes leverage vehicular social relations to schedule cache resources, the 
difference in their performance is smaller than the performance gap between the 
socially aware schemes and the greedy approach, which ignores vehicular social 
relation effects. Moreover, the performance gain provided by the DT mechanism is 
affected by the different vehicle distributions, driving states, and caching capacities 
in various areas. Therefore, there are differences in the gain effects of DT in these 
areas. 
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